JP4686048B2

JP4686048B2 - Pixel arithmetic unit

Info

Publication number: JP4686048B2
Application number: JP2001125119A
Authority: JP
Inventors: 広之岡; 督三清原; 誠平井; 浩三木村; 康介 ▲よし▼岡; 英志西田; 隆治松浦; 広之森下; 敏昭辻
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 2000-04-21
Filing date: 2001-04-23
Publication date: 2011-05-18
Anticipated expiration: 2021-04-23
Also published as: JP2002008025A

Description

【０００１】
【発明の属する技術分野】
本発明は、画像をリサイズするためのフィルタリング回路を含む画素演算装置に関する。
【０００２】
【従来の技術】
近年、デジタル映像機器の技術進歩が著しく、動画の圧縮処理／伸張処理、リサイズ等を扱ういわゆるメディアプロセッサが実用化されている。
画像のリサイズには、ＦＩＲ（ｆｉｎｉｔｅｉｍｐｕｌｓｅｒｅｓｐｏｎｓｅ)フィルタがよく用いられる。
【０００３】
図１は、従来技術におけるＦＩＲフィルタ処理を行う回路例を示すブロック図である。同図はタップ数７、係数が対称なＦＩＲフィルタである。
同図において、データ入力端子１００１より時系列的に入力されたデータは、遅延器１００２、１００３、１００４、１００５、１００６、１００７とこの順に順次転送される。フィルタ係数が対称である場合、つまりデータ入力端子の入力及び各遅延器の出力（タップと呼ばれる）に対応する係数が、中央のタップ（遅延機１００４の出力）に対して対称である場合、各タップのデータにフィルタ係数を乗算するのではなく、同じ係数のタップのデータ同士の加算を行ってから係数を乗算する。
【０００４】
例えば、データ入力部１００１の入力データと遅延器１００７の出力データは、加算器１００８で加算され、さらに乗算器１００８にて加算結果と係数h0とが乗算される。遅延器１００２の出力と遅延器１００６の出力は、加算器１００９で加算され、さらに、乗算器１００９にて加算結果と係数h1とが乗算される。
乗算器１０１１〜乗算器１０１４の各出力データは、加算器１０１５により加算される。加算器１０１５の出力データはフィルタ処理結果としてデータ出力端子１０１６から時系列的に出力される。係数h0〜h3は画像の縮小率に応じて定められる。例えば縮小率が１／２であれば、時系列の出力データを１／２に間引くことにより縮小画像が得られる。
【０００５】
また、フィルタ係数が対称に選ばれるのは、直線位相（位相特性が周波数に対して直線になること）が得られることにより画像の視覚上好ましいからである。
【０００６】
【発明が解決しようとする課題】
しかしながら、上記従来の方法では、画像データに対してフィルタリング処理を行う場合、回路の構成上、画像を構成する画素データを端から順に入力していくので、１クロックで入力できる画素データは１つであるため、処理速度を高めるには動作周波数を高める必要がある。高い動作周波数での動作は、コスト及び消費電力大きくなるという問題がある。
【０００７】
また、従来の方法ではタップ数毎に回路が異なるため自由度がなく、タップ数毎別に回路を設けると莫大なコストがかかってしまう。
本発明の第１の目的は、タップ数を可変にすることができ、周波数を上げずに処理を高速化するフィルタリング処理を行う画素演算装置を提供することにある。
【０００８】
本発明の第２の目的は、さらにフィルタリング処理だけでなくＭＣ（動き補償）処理にも利用可能で、回路規模の小型化を図った画素演算装置を提供することにある。
第３の目的は、さらにフィルタリング処理だけでなくＭＥ（動き予測）処理にも利用可能であり、回路規模の小型化を図った画素演算装置を提供することにある。
【０００９】
第４の目的は、さらにフィルタリング処理だけでなく、デジタル映像機器におけるＯＳＤ（On Screen Display）処理にも利用であり、回路規模の小型化を図った画素演算装置を提供することにある。
【００１０】
【課題を解決するための手段】
上記第１の目的を達成する画素演算装置は、フィルタ処理を行う画素演算装置であって、Ｎ個の画素処理手段と、Ｎ個の画素データ及びフィルタ係数を供給する供給手段と、Ｎ個の画素処理手段を並列に動作させる制御手段とを備える。
各画素処理手段は、供給手段に供給された画素データとフィルタ係数とを用いて演算した後、各画素処理手段に対して隣接する画素処理手段から画素データを取得し、取得した画素データを用いて演算して演算結果を累積する。前記制御手段は、隣接する画素処理手段からの画素データの取得と、取得した画素データを用いた演算及び累積とをタップ数に応じた回数繰り返すようＮ個の画素処理手段を制御する。
【００１１】
ここで、前記Ｎ個の画素処理手段は、Ｎ個の画素データを右シフトする第１シフタと、Ｎ個の画素データを左シフトする第２シフタを形成する。各画素処理手段は、隣接する２つの画素処理手段からシフトアウトされる２つの画素データを用いて演算する。
上記第２の目的を達成する画素演算装置は、画素データとして差分画像の画素データと参照フレームの画素データとを供給手段から供給する。
【００１２】
【発明の実施の形態】
本発明の画素演算ユニットは、主に（ａ）画像の拡大／縮小に用いられるフィルタ処理、（ｂ）動き補償（Moving Compensation、以下ＭＥ）処理、（ｃ）ＯＳＤ（On Screen Display）処理、（ｄ）動き予測（Moving Estimation、以下ＭＥ）処理などを選択的に実行するように構成される。（ａ）フィルタ処理については、画素演算ユニットは、でタップ数を固定することなく可変とし、水平方向又は垂直方向に連続する複数の画素（例えば１６画素）を並列に処理する。さらに、垂直方向のフィルタ処理は、圧縮動画データの伸長処理と同期して行う。
【００１３】
以下、本発明の実施の形態における画素演算ユニットについて次の順に説明する。
１メディアプロセッサの構成
１．１画素演算ユニットの構成
１．２画素並列処理部の構成
２．１フィルタ処理
２．２ＭＣ（動き補償）処理
２．３ＯＳＤ（オンスクリーンディスプレイ）処理
２．４ＭＥ（動き予測）処理
３．１垂直フィルタ処理（その１）
３．１．１１／２縮小
３．１．２１／４縮小
３．２垂直フィルタ処理（その２）
３．２．１１／２縮小
３．２．２１／４縮小
４変形例
＜１メディアプロセッサの構成＞
本実施形態における画素演算ユニットがメディア処理（圧縮音声動画データの伸長処理、音声動画データの圧縮処理など）を行うメディアプロセッサに内臓されている場合について以下説明する。メディアプロセッサは、例えばデジタルＴＶ放送を受信するセットトップボックス、テレビ受像機、ＤＶＤ録画再生装置などに実装される。
【００１４】
図２は、画素演算ユニットを備えるメディアプロセッサの構成を示すブロック図である。同図においてメディアプロセッサ２００は、デュアルポートメモリ１００、ストリームユニット２０１、入出力バッファ（以下Ｉ／Ｏバッファと略す）２０２、セットアッププロセッサ２０３、ビットストリームＦＩＦＯ２０４、可変長符号復号部（ＶＬＤ）２０５、可変長符号復号部２０５、変換エンジン（Transfer Engine、以下ＴＥ）２０６、画素演算ユニットＡ（以下ＰＯＵＡ）２０７、画素演算ユニットＢ（以下ＰＯＵＢ）２０８、ＰＯＵＣ２０９、オーディオユニット２１０、ＩＯＰ２１１、入出力プロセッサ（以下ＩＯＰ）２１１、ビデオバッファメモリ２１２、ビデオユニット２１３、ホストユニット２１４、ＲＥ２１５、フィルタ部２１６を備える。
【００１５】
デュアルポートメモリ１００は、外部メモリ２２０に対する入出力ポート（以下外部ポート）と、メディアプロセッサ２００内部に対する入出力（以下内部ポートと呼ぶ）と、キャッシュメモリとを備え、メディアプロセッサ２００内の各構成要素のうち外部メモリ２２０にデータを読み書きする構成要素（以下マスターデバイス）からのアクセス要求を内部ポートから受け付け、受け付けたアクセス要求に従って外部メモリ２２０をアクセスする。その際、デュアルポートメモリ１００は、内部のキャッシュメモリに外部メモリ２２０のデータの一部をキャッシングする。また、外部メモリ２２０はＳＤＲＡＭやＲＤＲＡＭなどのメモリであり、圧縮動画データ、圧縮音声データ、復号後の音声データ、復号後の動画データなどを一時的に記憶する。
【００１６】
ストリームユニット２０１は、外部からストリームデータ（いわゆるＭＰＥＧストリーム）を入力し、入力されたストリームデータをビデオエレメンタリーストリーム、オーディオエレメンタリーストリームに分離し、それぞれをＩ／Ｏバッファ２０２に書き込む。
Ｉ／Ｏバッファ２０２は、ビデオエレメンタリーストリーム、オーディオエレメンタリーストリーム、オーディオデータ（伸長されたオーディオデータ）を一時的に保持するバッファメモリである。ビデオエレメンタリーストリーム、オーディオエレメンタリーストリームはそれぞれストリームユニット２０１からＩ／Ｏバッファ２０２に格納され、さらにＩＯＰ２１１の制御によってデュアルポートメモリ１００を介して外部メモリ２２０に格納される。オーディオデータは、ＩＯＰ２１１の制御によって外部メモリ２２０からデュアルポートメモリ１００を介してＩ／Ｏバッファ２０２に格納される。
【００１７】
セットアッププロセッサ２０３は、オーディオエレメンタリーストリームのデコード（伸長）と、ビデオエレメンタリーストリームのマクロブロックのヘッダ解析とを行う。オーディオエレメンタリーストリーム及びビデオエレメンタリーストリームは、ＩＯＰ２１１の制御によって、外部メモリ２２０からデュアルポートメモリ１００を介してビットストリームＦＩＦＯ２０４に転送される。セットアッププロセッサ２０３はビットストリームＦＩＦＯ２０４からオーディオエレメンタリーストリームを読み出してデコードし、デコード後のオーディオデータをセットアップメモリ２１７に格納する。セットアップメモリ２１７内のオーディオデータは、ＩＯＰ２１１によってデュアルポートメモリ１００を介して外部メモリ２２０に転送される。また、セットアッププロセッサ２０３は、ビットストリームＦＩＦＯ２０４からビデオエレメンタリーストリームを読み出してマクロブロックヘッダを解析し、解析結果をＶＬＤ２０５に通知する。
【００１８】
ビットストリームＦＩＦＯ２０４は、ビデオエレメンタリーストリームを可変長符号復号部２０５に、オーディオエレメンタリーストリームをセットアッププロセッサ２０３に供給するためのＦＩＦＯメモリである。ビデオエレメンタリーストリーム及びオーディオエレメンタリーストリームは、ＩＯＰ２１１の制御によって外部メモリ２２０からデュアルポートメモリ１００を介してビットストリームＦＩＦＯ２０４に転送される。
【００１９】
ＶＬＤ２０５は、ビットストリームＦＩＦＯ２０４から供給されるビデオエレメンタリーストリームに含まれる可変長符号を復号する。この復号結果はマクロブロック単位のＤＣＴ係数群である。
ＴＥ２０６は、ＶＬＤ２０５の復号結果に対してマクロブロック単位にＩＱ（逆量子化）処理及びＩＤＣＴ（逆ＤＣＴ）処理を行う。これらの処理結果はマクロブロックである。１マクロブロックは、４つの輝度ブロック（Ｙ１〜Ｙ４）と２つの色差ブロック（Ｃｂ、Ｃｒ）からなる。１ブロックは８×８画素である。但し、Ｐピクチャ、Ｂピクチャについては１ブロックは８×８個の差分値としてＴＥ２０６から出力され。ＴＥ２０６は復号結果をデュアルポートメモリ１００を介して外部メモリ２２０に格納する。
【００２０】
ＰＯＵＡ２０７は、主に（ａ）フィルタ処理、（ｂ）ＭＣ処理、（ｃ）ＯＳＤ処理、（ｄ）動き予測（Moving Estimation）処理などを選択的に実行する。
（ａ）のフィルタ処理では、ＰＯＵＡ２０７は外部メモリ２２０に格納されたビデオデータ（フレームデータ）に含まれる１６個の画素データを並列にフィルタリングし、フィルタリング後の１６個の画素を間引く又は補間することにより縮小、拡大する。縮小語の後のデータはＰＯＵＣ２０９の制御によってデュアルポートメモリ１００を介して外部メモリ２２０に格納される。
【００２１】
（ｂ）のＭＣ処理では、ＰＯＵＡ２０７は、ＴＥ２０６によって外部メモリ２２０に格納されたＰピクチャ及びＢピクチャについてのＩＱ及びＩＤＣＴ処理結果（つまり画素データの差分値）と、参照フレーム中の画素データとを１６並列に加算する。１６組の差分値と画素データは、セットアッププロセッサ２０３におけるマクロブロックヘッダ解析によって検出された動きベクトルに従って、ＰＯＵＣ２０９によってＰＯＵＡ２０７に入力される。
【００２２】
（ｃ）ＯＳＤ処理では、ＰＯＵＡ２０７は、外部メモリ２２０等に格納されたＯＳＤ画像（静止画）をデュアルポートメモリ１００を介して入力し、外部メモリ２２０内の表示用フレームデータに上書きする。ここでＯＳＤ画像とは、ユーザのリモコン操作などに応じて表示されるメニュー画像や、時刻表示、チャネル番号表示などをいう。
【００２３】
（ｄ）のＭＥ処理とは、未圧縮のフレームデータ内の符号化対象のマクロブロックに対して、参照フレーム中の相関性の高い矩形領域を探索し、符号化対象のマクロブロックから相関性の最も高い矩形領域を指す動きベクトルを求める処理である。ＰＯＵＡ２０７は、符号化対象のマクロブロックの画素と、探索領域内の矩形領域の画素との差分を１６個並列で算出する。
【００２４】
ＰＯＵＢ２０８は、ＰＯＵＡ２０７と同一構成であり、上記（ａ）〜（ｄ）の処理を動的に分担する。
ＰＯＵＣ２０９は、ＰＯＵＡ２０７及びＰＯＵＢ２０８に対する画素データ群の供給と、処理結果の外部メモリ２２０への転送とを制御する。
オーディオユニット２１０は、Ｉ／Ｏバッファ２０２に格納されたオーディオデータを出力する。
【００２５】
ＩＯＰ２１１は、メディアプロセッサ２００内のデータ入出力（データ転送）を制御する。データ転送には次の種類がある。第１は、Ｉ／Ｏバッファ２０２に格納されたストリームデータをデュアルポートメモリ１００を介して外部メモリ２２０内のストリームバッファ領域に転送することである。第２は、外部メモリ２２０に格納されたビデオエレメンタリーストリーム及びオーディオエレメンタリーストリームをデュアルポートメモリ１００を介してビットストリームＦＩＦＯ２０４に転送することである。第３は、外部メモリ２２０に格納されたオーディオデータをデュアルポートメモリ１００を介してＩ／Ｏバッファ２０２に転送することである。
【００２６】
ビデオユニット２１３は、外部メモリ２２０のビデオデータ（画像フレーム）から2,3ライン分の画素データを読み出して、ビデオバッファメモリ２１２に格納し、その2,3ライン分の画素データを映像信号に変換して外部に接続されたテレビ受像器等のディスプレィ装置に出力する。
ホストユニット(HOST)２１４は、外部のホストマイコンからの指示を受け取り、指示に応じてＭＰＥＧデコード、ＭＰＥＧエンコード、ＯＳＤ処理、縮小・拡大処理など開始・終了を制御する。
【００２７】
レンダリングエンジン(RE)２１５は、マスターデバイスであり、コンピュ−タ・グラフィックスにおけるレンダリング処理を行う。外部に専用ＬＳＩ２１８とが接続されている場合に間でデータ入出力を行う。
フィルタ２１６は、静止画データの拡大縮小処理を行う。外部に専用ＬＳＩ２１８とが接続されている場合に間でデータ入出力を行う。
【００２８】
上記ではメディアプロセッサが、ストリームユニット２０１からストリームデータを入力してデコード（伸長）する場合を中心に説明したが、圧縮されていないビデオデータ及びオーディオデータをエンコード（圧縮）する場合は、逆の流れとなる。その際、ＰＯＵＡ２０７（又はＰＯＵＢ２０８）はＭＥ処理を、ＴＥ２０６はＤＣＴ処理及びＱ（量子化）処理を、ＶＬＤ２０５は可変長符号化を、行う。
＜１．１画素演算ユニットの構成＞
図３は、画素演算ユニットの構成を示すブロック図である。
【００２９】
ＰＯＵＡ２０７とＰＯＵＢ２０８は同じ構成であるため、ここではＰＯＵＡ２０７を説明する。
同図のようにＰＯＵＡ２０７は、画素並列処理部２１、入力バッファ群２２出力バッファ群２３、命令メモリ２４、命令デコーダ２５、指示回路２６、ＤＤＡ回路２７を備える。
【００３０】
画素並列処理部２１は、画素転送部１７、１６個の画素処理部１〜画素処理部１６、画素転送部１８を備え、入力バッファ群２２から入力される複数の画素を対象に上記（ａ）フィルタ処理、（ｂ）ＭＣ処理、（ｃ）ＯＳＤ処理、（ｄ）ＭＥ）処理を行い、出力バッファ群２３に出力する。（ａ）〜（ｄ）の各処理は、マクロブロック単位すなわち１６画素を１６回（１６ライン分）繰り返すことにより終了する。各処理の起動は、ＰＯＵＣ２０９により制御される。また、画素転送部１７は、フィルタ処理において１６個の画素のさらに左側（又は上側）の複数画素（ここでは８画素）を保持し、クロック毎に右シフトする。画素転送部１８は、フィルタ処理において１６個の画素のさらに右側（又は下側）の複数画素（ここでは８画素）を保持し、クロック毎に左シフトアウトする。
【００３１】
入力バッファ群２２は、ＰＯＵＣ２０９の制御により、デュアルポートメモリ１００から転送される処理対象となる複数の画素を保持し、さらにフィルタ処理ではフィルタ係数も保持する。
出力バッファ群２３は、画素並列処理部２１による処理結果（１６画素に対応する１６の処理結果）の並びを任意に変更して一時的に保持する。フィルタ処理では画素の並びを変更して保持することにより画素の間引き（縮小時）又は補間（拡大時）とを行う。
【００３２】
命令メモリ２４は、フィルタ処理用のマイクロプログラム（フィルタμＰ）、ＭＣ処理用のマイクロプログラム（ＭＣμＰ）、ＯＳＤ処理用のマイクロプログラム（ＯＳＤμＰ）、ＭＥ処理用のマイクロプログラム（ＭＥμＰ）を記憶している。これ以外にも命令メモリ２４は、マクロブロックのフォーマット変換用のマイクロプログラム、画素の数値表現を変換するためのマイクロプログラムなどを記憶している。ここで、マクロブロックのフォーマットとは、ＭＰＥＧ規格に定められている「４：２：０」、「４：２：２」、「４：４：４」などのＹ、Ｃｂ、Ｃｒブロックの画素のサンプリングレートの比率をいう。画素の数値表現には画素のとりうる値として０〜２５５で表現される場合（一般的なＭＰＥＧデータ等）と−１２８〜１２７で表現される場合（ＤＶカメラ等）がある。
【００３３】
命令デコーダ２５は、命令メモリ２４からマイクロプログラム中のマイクロコードを逐次読み出して解読し、解読結果に従ってＰＯＵＡ２０７内の各部を制御する。
指示回路２６は、ＰＯＵＣ２０９から命令メモリ２４のどのマイクロプログラムを起動すべきかの指示（開始アドレス等）を受付けて、指示されたマイクロプログラムを起動する。
【００３４】
ＤＤＡ回路２７は、フィルタ処理において、入力バッファ群２２に保持されたフィルタ係数群の選択制御を行う。
＜１．２画素並列処理部の構成＞
図４、図５は、画素並列処理部の左半分、右半分の詳細な構成を示すブロック図である。
【００３５】
図４において画素転送部１７は、８個の入力ポートＡ１７０１〜Ｈ１７０８、画素データ保持し１クロック時間遅延する８個の遅延器Ａ１７０１〜遅延器Ｈ１７０９、入力ポートの画素データと左の遅延器出力の内一方を選択する７個の選択部Ａ１７１７〜Ｇ１７２３から構成され、入力バッファ群２２から並列入力される８画素を８つの遅延器に保持し、８つの遅延器に保持した画素をクロック同期して右シフトする右シフタとして機能する。
【００３６】
図５において画素転送部１８は画素転送部１７と比べシフトする方向が左である点で異なり、これ以外は同様の構成なので説明を省略する。
図４、図５における１６個の画素処理部１〜画素処理部１６は、いずれも同一構成であるので、画素処理部２を代表として説明する。
画素処理部２は、入力ポートＡ２０１〜入力ポートＣ２０３と、選択部Ａ２０４、Ｂ２０５、遅延器Ａ２０６〜Ｄ２０９、加算器Ａ１２０、乗算器Ａ２１１、加算器Ｂ２１２、出力ポートＤ２１３とから構成される。
【００３７】
選択部Ａ２０４は、入力ポートＡ２０１から入力される画素データと左隣の画素転送部１７から出力される画素データとのうち一方を選択する。
選択部Ａ２０４と遅延器Ａ２０６は、右隣の画素処理部３から入力される画素データを左隣の画素処理部１にシフト出力する機能も果たす。
選択部Ｂ２０５は、入力ポートＢ２０２から入力される画素データと右隣外部メモリ２２０からシフト出力される画素データとのうち一方を選択する。
【００３８】
選択部Ｂ２０５と遅延器Ｂ２０７は、左隣の画素処理部１から入力される画素データを右隣の画素処理部３にシフト出力する機能も果たす。
遅延器Ａ２０６、遅延器Ｂ２０７はそれぞれ選択部Ａ２０４、選択部Ｂ２０５に選択された画素データを保持する。
遅延器Ｂ２０７は、入力ポートＣ２０３からの画素データを保持する
加算器Ａ１２０は遅延器Ａ２０６と遅延器Ｂ２０７から出力される画素データを加算する。
【００３９】
乗算器Ａ２１１は、加算器Ａ１２０の加算結果と遅延器Ｃ２０８からの画素データとを乗算する。この乗算器Ａ２１１はフィルタ処理では画素データとフィルタ係数との乗算に利用される。
加算器Ｂ２１２は、乗算器Ａ２１１の乗算結果を遅延器Ｄ２０９のデータとを加算する。
【００４０】
遅延器Ｄ１０９は、加算器Ｂ２１２の加算結果を累積する。
画素処理部２は、これらの構成要素を選択的に組み合わせて動作させることにより上記（ａ）フィルタ処理、（ｂ）ＭＣ処理、（ｃ）ＯＳＤ処理、（ｄ）ＭＥ処理を実行する。これらの構成要素を選択的に組み合わせる動作は、命令メモリ２４及び命令デコーダ２５によるマイクロプログラム制御によってなされる。
【００４１】
図６（ａ）は、入力バッファ群２２の詳細な構成を示すブロック図である。
同図のように入力バッファ群２２は、画素転送部１７に画素データを供給する８個のラッチ２２１と、画素処理部１〜１６に画素データを供給する１６個のラッチ部２２２と、画素転送部１８に画素データを供給する８個のラッチ２２３とから構成される。これらは、ＰＯＵＣ２０９の制御により外部メモリ２２０からデュアルポートメモリ１００を介して画素データ群が転送される。
【００４２】
各ラッチ部２２２は、画素処理部の入力ポートＡ、Ｂに画素データを供給する２個のラッチと、画素処理部の入力ポートＣに画素データ又はフィルタ係数を供給する選択部２２４からなる。
図６（ｂ）は、選択部２２４の詳細な構成を示すブロック図である。
同図のように選択部２２４は、８つのラッチ２２４ａ〜２２４ｈと、８つのラッチからデータの何れか１つを選択するセレクタ２２４ｉとからなる。
【００４３】
ラッチ２２４ａ〜２２４ｈは、フィルタ処理においてフィルタ係数a0〜a7（又はa0/2、a1〜a7）を保持する。これらフィルタ係数は、ＰＯＵＣ２０９により外部メモリ２２０からデュアルポートメモリ１００を介してラッチ２２４ａ〜２２４ｈに転送される。
セレクタ２２４ｉは、ＤＤＡ回路２７の制御によってクロックに同期してラッチ２２４ａから２２４ｈに順次選択される。このようにフィルタ係数の画素処理部への供給は、マイクロコードにより直接制御されるのではなく、ＤＤＡ回路２７によりハードウェアにより制御されるので高速化される。
【００４４】
図７は、出力バッファ群２３の構成を示すブロック図である。
同図のように出力バッファ群２３は、１６個のセレクタ２４ａ〜２４ｐと、１６個のラッチ２３ａ〜２３ｐとからなる。
セレクタ２４ａ〜２４ｐは、いずれも画素処理部１〜１６の１６個の処理結果が入力され、そのうち１つを選択する。この選択制御は命令デコーダ２５によってなされる。
【００４５】
ラッチ２３ａ〜２３ｐはそれぞれセレクタ２４ａ〜２４ｐの選択結果を保持する。
例えば、フィルタ処理の結果を１／２に縮小する場合には、１６個の画素に対する画素処理部１〜１６の１６個の処理結果のうち、画素処理部１、３、５、・・・１５の処理結果を８個のセレクタ２４ａ〜２４ｈが選択してラッチ２３ａ〜２３ｈに格納され、さらに、次の１６個の画素に対する画素処理部１〜１６の１６個の処理結果のうち、画素処理部２、４、６、・・・１６の処理結果を８個のセレクタ２４ｉ〜２４ｐが選択してラッチ２３ｉ〜２３ｐに格納される。このようにして画素が間引かれ、１／２縮小された１６個の画素データが出力バッファ群２３に保持され、さらにＰＯＵＣ２０９の制御によりデュアルポートメモリ１００を介して外部メモリ２２０に転送される。
＜２．１フィルタ処理＞
画素演算ユニットにおけるフィルタ処理の詳細について説明する。
【００４６】
ＰＯＵＣ２０９はフィルタ処理の対象となるマクロブロックを特定し、ＰＯＵＡ２０７又はＰＯＵＢ２０８に対して３２個の画素データ及びフィルタ係数a0/2,a1〜a7を初期値として入力バッファ群２２に転送し、さらに指示回路２６にタップ数の通知とともにフィルタ処理の開始を指示する。
図８は、画素演算ユニット（ＰＯＵＡ２０７）にてフィルタ処理を行う場合の画素データの初期入力値を示す図である。同図において入力ポート欄は図４、図５に示した各入力ポートを意味する。入力画素欄は、入力バッファ群２２から各入力ポートに供給される画素データを意味する。出力ポート欄は図４、図５に示した出力ポートＤ（加算器Ｂ出力）を、出力画素欄はその出力値を意味する。
【００４７】
入力ポートに画素データを供給する入力バッファ群２２には、図９に示すように水平方向に連続する３２個の画素データX1〜X32が、ＰＯＵＣ２０９によって転送されて保持されている。ここでのフィルタ処理の対象はX9〜X24の16個の画素データである。図８のように画素処理部１〜１６の入力ポートＡ及びＢには画素データX9〜X24が、入力ポートＣには入力バッファ群２２にて選択されたフィルタ係数a0/2が初期値として供給される。
【００４８】
さらに、入力バッファ群２２から初期入力値が画素並列処理部２１に供給された後、フィルタ処理として所望するタップ数に応じた数のクロック入力によりフィルタ処理がなされる。
図１０は、16個の画素処理部のうち画素処理部１を代表として、その演算過程を示す説明図である。同図では、入力クロック数毎に、画素処理部１内の遅延器Ａ〜Ｄの保持内容と、加算器Ｂの出力値とを記している。また、図１１は、画素処理部１のクロック入力毎の出力ポートＤ（加算器Ｂ出力）の出力値を示す図である
画素処理部１は最初のクロック入力（CLK1）によって初期入力値として遅延器Ａ及びＢは画素データX9を、遅延器Ｄはフィルタ係数a0/2を保持し、遅延器Dは０クリアされる。このとき選択部Ａ及びＢは何れも入力ポートを選択している。その結果加算器Ａは(X9+X9)を、乗算器Ａは(X9+X9)*a0/2を、加算器Ｂは(X9*a0/2+0（つまりa0*X9）を出力する（図１１参照）。
【００４９】
２回目のクロック入力（CLK2）以降では、選択部Ａ及びＢは入力ポートＡ、Ｂではなく隣接する画素処理部又は画素転送部からのシフト出力を選択する。
２回目のクロック入力（CLK2）によって、遅延器Ａ〜Ｄには、画素データX10、X8、フィルタ係数a1、a0*X9を保持する。その結果、加算器Ｂはa0*X9+a1(X10+X8)を出力する（図１１参照）。このように２回目は、フィルタ係数a1（遅延器Ｃ）と、両隣からシフト出力される画素データの和（加算器Ａ）とを乗算器Ａにて乗算している。加算器Ｂは、この乗算結果と遅延器Ｄの累積値とを加算している。
【００５０】
３回目のクロック入力（CLK3）では、画素処理部１は２回目のクロック入力と同様に動作をすることにより、加算器Ｂからa0*X9+a1(X10+x8)+a2(X11+X7)を出力する。
４回目〜９回目のクロック入力（CLK4〜CLK９）でも同様に動作をすることにより、加算器Bは図１１に示す出力値をそれぞれ出力することになる。
【００５１】
このようにして、画素処理部１のファイルタ処理結果（出力データ）は９クロックの場合、
a0・X9+a1(X10+X8)+a2(X11+X7)+a3(X12+X6)
+a4(X13+X5)+a5(X14+X4)+a6(X15+X3)+a7(X16+X2)+a8(X17+X1)
となる。
【００５２】
図１０、図１１ではCLK９までの処理過程を示しているが、入力クロック数はＰＯＵＣ２０９から通知されたタップ数に応じて命令デコーダ２５の制御によって打ち切られる。すなわち、各画素処理部は、タップ数３の場合はCLK２でフィルタ処理を終了し、タップ数５の場合はCLK３で終了し、タップ数７の場合はCLK4でフィルタ処理を終了する。別言すると、タップ数（２ｎ−１）のフィルタ処理ではｎ回のクロック入力で終了する。
【００５３】
命令デコーダ２５は１６画素の並列処理を１６ライン分繰り返し、これにより４ブロックのフィルタ処理を終える。その際、１６個のフィルタ処理結果は、出力バッファ群２３において間引き処理または補間処理されることによって縮小又は拡大される。出力バッファ群２３の縮小又は拡大後の画素群は１６個保持される毎にＰＯＵＣ２０９の制御によってデュアルポートメモリ１００を介して外部メモリ２２０に転送される。また、命令デコーダ２５は、16ライン目終了時にＰＯＵＣ２０９に終了した旨を通知する。ＰＯＵＣ２０９は、次のマクロブロックについて上記と同様にＰＯＵＡ２０７を初期入力値とフィルタ係数の供給及びフィルタ処理の開始を指示する。
【００５４】
なお、画素処理部２のファイルタ処理結果は９クロックの場合次式となる。
a0・X10+a1(X11+X9)+a2(X12+X8)+a3(X13+X7)
+a4(X14+X6)+a5(X15+X5)+a6(X16+X4)+a7(X17+X3)+a8(X18+X2)
画素処理部３のファイルタ処理結果は９クロックの場合次式となる。
a0・X11+a1(X12+X10)+a2(X13+X9)+a3(X14+X8)
+a4(X15+X7)+a5(X16+X6)+a6(X17+X5)+a7(X18+X4)+a8(X19+X3)
画素処理部４〜１６のファイルタ処理結果も画素位置が異なるのみで同様なので省略する。
【００５５】
このように画素並列処理部２１は、16個の入力画素に対して並列にフィルタ処理を実行し、しかも入力クロック数の制御によってタップ数を任意にすることができる。
なお、図８では、画素処理部１の入力ポートＡ、Ｂ、Ｃの入力画素が（X9、X9、a0/2）としているが、（X9、0、a0）又は（0、X9、a0）としてもよい。画素処理部２〜１６も対象画素が異なるだけで同様にしてもよい。
＜２．２ＭＣ（動き補償）処理＞
復号対象フレームがＰピクチャである場合のＭＣ処理の詳細について説明する。
【００５６】
ＰＯＵＣ２０９は指示回路２６にＭＣ処理の開始を指示するとともに、ＭＣ処理の対象となる復号処理中のフレーム内のマクロブロック（差分値）と、参照フレームにおける動きベクトルが指す矩形領域を特定し、ＰＯＵＡ２０７又はＰＯＵＢ２０８に対して１６個の差分値D1〜D16及び矩形領域内の１６個の画素データP1〜P16を入力バッファ群２２に設定する。
【００５７】
図１２は、画素演算ユニットにてＭＣ処理（Ｐピクチャ）を行う場合の入出力画素データを示す図である。同図において入力ポート欄は、図４及び図５に示した画素転送部１７、画素処理部１〜１６、画素転送部１８の入力ポートを意味する。入力画素欄は、入力ポートに入力される画素データを意味する。ＭＣ処理では画素転送部１７及び１８は使用されないので、入力画素は何であってもよい（don't care）。出力ポート欄は図４、図５に示した出力ポートＤ（加算器Ｂ出力）を、出力画素欄はその出力値を意味する。
【００５８】
図１３はＭＣ処理における画素処理部１〜１６への入力画素の説明図である。同図に示すようにD1〜D16は復号対象フレームのマクロブロック（ＭＢ）中の１６個の差分値である。P1〜P16は参照フレームにおいて動きベクトルが指す矩形領域中の１６個の画素データである。
ＭＣ処理では、画素処理部１〜１６内の選択部Ａ、Ｂはそれぞれ常に入力ポートＡ、Ｂを選択する。これにより、入力ポートＡからの画素データ、入力ポートＢからの差分値は、選択部Ａ、Ｂを介して遅延器Ａ、Ｂに入力され保持され、さらに加算器Ａにて加算される。この加算結果は乗算器にて１倍され、加算器Ｂにて０を加えられて出力ポートＤから出力される。つまり入力ポートＡからの画素データと入力ポートＢからの差分値と単純に加算され出力ポートＤから出力される。
【００５９】
さらに１６個の加算結果は出力バッファ群２３に格納され、ＰＯＵＣ２０９によりデュアルポートメモリ１００を介して外部メモリ２２０内の復号対象フレームに書き戻される。
以上の処理を復号対象フレームの１６画素単位に繰り返すことによりＭＣ処理がなされる。なお、各画素処理部では単純加算しているだけであり１クロック毎に１６画素の加算結果を得ることができる。
【００６０】
次に復号対象フレームがＢピクチャの場合のＭＣ処理を説明する。
図１４は、画素演算ユニットにてＭＣ処理（Ｂピクチャ）を行う場合の入出力画素データを示す図である。同図において入力ポート欄、入力画素欄、出力ポート欄、出力画素欄は、図１２と同様である。ただし、入力画素欄は第1クロック（CLK1）と第２クロック（CLK2）と2回に分けて入力される点が図１２と異なっている。
【００６１】
P1〜P16と、B1〜B16は異なる２つの参照フレームにおいてそれぞれ動きベクトルが指す矩形領域中の１６個の画素データである。
ＭＣ処理では、画素処理部１〜１６内の選択部Ａ、Ｂはそれぞれ常に入力ポートＡ、Ｂを選択する。第１クロック（CLK１）において入力ポートＡ、Ｂから選択部Ａ、Ｂを介して遅延器Ａ、ＢにP1、B1が保持され、同時に入力ポートCから定数1/2遅延器Cに保持される。これにより乗算器Ａから（P1+B1)/2が得られる。第２クロック(CLK2)において、乗算結果（P1+B1)/2が遅延器Ｄに保持され、同時に入力ポートＡ、Ｂ、Ｃからの(1,0,D1)が遅延器Ａ、Ｂ、Ｃに保持されるので、乗算器ＡからのD1と遅延器Dからの（P1+B1)/2とが加算器Ｂにより加算される。その結果出力ポートから（P1+B1)/2+D1が出力される。
【００６２】
さらに１６個の加算結果は出力バッファ群２３に格納され、ＰＯＵＣ２０９によりデュアルポートメモリ１００を介して外部メモリ２２０内の復号対象フレームに書き戻される。
以上の処理を復号対象フレームの１６画素単位に繰り返すことによりＢピクチャに対するＭＣ処理がなされる。
＜２．３ＯＳＤ（オンスクリーンディスプレイ）処理＞
ＰＯＵＣ２０９は指示回路２６にＯＳＤ処理の開始を指示するとともに、外部メモリ２２０に保持されたＯＳＤ画像から順次１６個の画素データX1〜X16を読み出して入力バッファ群２２に設定する。
【００６３】
図１５は、画素演算ユニットにてＯＳＤ（オンスクリーンディスプレイ）処理を行う場合の入出力画素データを示す図である。
同図において画素転送部１７、１８は使用されない。画素処理部１〜１６の入力ポートＡには入力バッファ群２２から画素データX1〜X16が、入力ポートＢにはそれぞれ０が、入力ポートＣにはそれぞれ１が入力される。図１６にＯＳＤ画像中の１６個の画素が順次入力バッファ群２２に書き込まれる様子を示す。
【００６４】
画素処理部１〜１６内の各選択部Ａ、Ｂは、ＯＳＤ処理では入力ポートを常に選択する。例えば、画素処理部１では、入力ポートＡの画素データX1、入力ポートＢの”０”は、それぞれ遅延器Ａ、Ｂに保持され、さらに加算器Ａにより加算される（X1+0＝X1）。加算結果は乗算器Ａにて入力ポートＣから入力された”１”と乗算され加算器Ｂにて”０”が加算される。その結果、入力ポートＡの画素データX1はそのまま加算器Bから出力されることになる。同様に画素処理部２〜画素処理部１６からの入力ポートＡの画素データX2〜X16がそのまま加算器Ｂから出力される。
【００６５】
加算器Ｂから出力された画素データX1〜X16は出力バッファ群２３に格納され、さらにＰＯＵＣ２０９によってデュアルポートメモリ１００を介して外部メモリ２２０内の表示用フレームデータに上書きされる。
上記処理を図１６に示したように、ＯＳＤ画像全体に繰り返すことにより、外部メモリ２２０内のＯＳＤ画像を表示用フレームデータに上書きコピーすることになる。これは、ＯＳＤ処理のうち最も単純な処理であり、ＰＯＵＡ２０７又はＰＯＵＢ２０８は単にＯＳＤ画像を１６画素単位に中継しているだけである。
【００６６】
なお、ＯＳＤ処理の他の形態として、（１）ＯＳＤ画像と表示用フレームデータとをブレンドしてもよい。ブレンド率が０．５の場合には、入力バッファ群２２から画素処理部１〜画素処理部１６の各入力ポートＡにＯＳＤ画像の画素データ、各入力ポートＢに表示用フレームデータの画素データを供給すればよい。
また、ブレンド率がα：（１−α）の場合は、入力バッファ群２２から第１クロックにおいて各画素処理部の入力ポートＡ、Ｂ、Ｃに（ＯＳＤ画像の画素データ、０、α）を、第２クロックにおいて（０、表示用フレームデータの画素データ、１−α）を供給すればよい。
【００６７】
また、ＯＳＤ画像を縮小表示する場合には、入力バッファ群２２からＯＳＤ画像に上記フィルタ処理を施し、出力バッファ群２３から表示用フレームデータ内の縮小表示すべき位置に上書きコピーすればよい。
さらに、ＯＳＤ画像をフィルタ処理によって縮小した後上記ブレンドをするようにしてもよい。
＜２．４ＭＥ（動き予測）処理＞
図１７は、画素演算ユニットにてＭＥ（動き予測）処理を行う場合の入出力画素データを示す図である。同図の入力画素欄においてX1〜X16は符号化対象のフレーム中のマクロブロックの16画素であり、R1〜R16は参照フレーム中の１６×１６画素の矩形領域中の１６画素である。図１８はこれらの画素の関係を示す説明図である。同図の参照フレーム中動きベクトル（ＭＶ）探索範囲は、符号化対象のマクロブロックと同じ位置の周辺（例えば水平及び垂直方向に＋１６画素〜−１６画素）の動きベクトルを探索する対象となる範囲である。このＭＶ探索範囲には、１６画素×１６画素の矩形領域が、画素単位の探索であれば１６×１６通りの位置に存在し、ハーフペル（１／２画素）単位の探索であれば３２×３２通りの位置に存在する。図１３ではＭＶ探索範囲内の左上の矩形領域のみを図示している。
【００６８】
ＭＥ処理は、ＭＶ探索範囲内の個々の矩形領域と、符合化対象のマクロブロックとの間で、各画素同士の差分の総和を求め、さらに総和が最小の矩形領域（つまり相関性の最も高い矩形領域）と符号化対象マクロブロックとの相対的な位置の変位を動きベクトルと決定する。符号化対象ブロックは相関性の最も高い矩形領域と差分がとられる。
【００６９】
入力バッファ群２２には、ＰＯＵＣ２０９の制御によって、符号化対象の画素データX1〜X16と、一の矩形領域の画素データR1〜R16とが転送される。この矩形領域内の画素データR1〜R16はクロック毎に矩形領域内の１ライン分が転送される。従って一の矩形領域について16ライン分のR1〜R16が転送される。
図１７によれば、図４に示した例えば画素処理部１は、第１クロックで入力ポートＡの画素データX1と、入力ポートＢの画素データR1との減算及び絶対値化が加算器Ａにてなされ、乗算器Ａを素通りする（１倍される）。加算器Ｂは乗算器出力と遅延器Ｄの保持データとの加算値が出力される。第１クロックでは加算器Bは１ライン目の｜X1-R1｜を出力することになる。
【００７０】
第２クロックでは、遅延器に１ライン目の｜X1-R1｜が保持されるので、加算器Ｂは、乗算器Ａからの２ライン目の｜X1-R1｜と遅延器Dに保持された１ライン目の｜X1-R1｜とを加算する。
第３クロックでは、遅延器に１及び２ライン目の｜X1-R1｜が累積されるので、加算器Ｂは、乗算器Ａからの３ライン目の｜X1-R1｜と遅延器Dに保持された１ライン目の｜X1-R1｜とを加算する。
【００７１】
同様の繰り返しにより第１６クロックでは、加算器Ｂは、１〜１６ラインまでの｜X1-R1｜の累積値(Σ｜X1-R1｜)を出力する。
画素処理部２〜１６についても各々累積値(Σ｜X1-R1｜)〜(Σ｜X16-R16｜)がを出力する。
これら１６個の累積値は第１７クロックにおいて出力バッファ群２３に保持され、ＰＯＵＣ２０９によって取り出され、１６個の累積値の合計が算出された後外部メモリ２２０内のワークエリアに保存される。
【００７２】
以上により一矩形領域と符号化対象マクロブロックとの画素データの差分の総和の計算が終了する。
この後さらに、ＭＶ探索範囲内の他の矩形領域についても同様にして差分の総和が算出される。ＭＶ探索範囲内の全て矩形領域（あるいは必要な矩形領域）について差分の総和が算出されると、そのうち最小の値をもつ矩形領域が最も相関性の高い矩形領域と判断され、動きベクトルが生成される。
【００７３】
なお、上記ＭＥ処理では画素処理部からの１６個の累積値の合計を別途行っているが、１６個の累積値の合計を画素処理部１〜１６において算出するようにしてもよい。この場合、一の矩形領域についての１６個の累積値は出力バッファ群２３からそのまま外部メモリ２２０のワークエリアに保存しておき、このワークエリアに１６個以上の矩形領域について累積値群が保存されたときに、画素処理部１〜１６のそれぞれが１つの矩形領域を分担して１６個の累積値を順次累積することにより差分の総和を求めるようにすればよい。
【００７４】
また、上記ＭＥ処理では画素単位で差分の算出を行っているが、ハーフペル単位で行うようにしてもよい。その場合、ハーフラインと実ラインのうち、実ラインに対しては上記のように１クロックで｜X1-R1｜を算出し、ハーフラインに対しては、例えば２クロックのうち１クロックでハーフペルの画素値（(R1+R1')/2)を算出し、次の１クロックで差分｜X1-(R1+R1')/2｜を算出するようにしてもよい。あるいは、５クロックのうち４クロックでハーフペルの画素値（(R1+R1'+R2+R2')/4)を算出し、次の１クロックで差分を算出するようにしてもよい。
＜３．１垂直フィルタ処理（その１）＞
図１９は、図２に示したメディアプロセッサにおいて垂直フィルタ処理する場合のデータの流れを示した、メディアプロセッサの模式的なブロック図である。
【００７５】
同図において、デコーダ部３０１は、図２中のビデオエレメンタリーストリームをデコード（伸長する）ＶＬＤ２０５、ＴＥ２０６及びＰＯＵＡ２０７（ＭＣ処理）に相当し、ビデオエレメンタリーストリームをデコード（伸長）する。
フレームメモリ３０２は、外部メモリ２２０に相当し、デコード結果のビデオデータ（フレームデータ）を保持する
垂直フィルタ３０３は、ＰＯＵＢ２０８に相当し、垂直方向のフィルタ処理により垂直方向の縮小を行うする。
、バッファメモリ３０４は、外部メモリ２２０に相当し、縮小されたビデオデータ（表示用のフレームデータ）を保持する。
【００７６】
画像出力部３０５は、ビデオバッファメモリ２１２、ビデオユニット２１３に相当し、表示フレームデータを映像信号に変換して出力する。
なお、ＰＯＵＡ２０７はＭＣ処理を、ＰＯＵＢ２０８は垂直フィルタ処理を分担する。また、水平フィルタ処理による水平方向の縮小は、フレームメモリ３０２のデコードフレームデータに対してＰＯＵＡ２０７、ＰＯＵＢ２０８の一方が行うものとする。
＜３．１．１１／２縮小＞
図２０は、図１９において１/２縮小処理を行う場合のフレームメモリ３０２、バッファメモリ３０４のデータ供給状態の時間変化を示す図である。
【００７７】
図２０において、グラフ７０１〜７０３の縦軸は、それぞれフィールドの垂直同期信号の周期Vを単位とする時間を示す。同図では５周期分を記してあり、グラフ７０〜７０３では時間軸が一致している。グラフ７０１の横軸は、フレームメモリ３０２データ量を示す。グラフ７７０２の横軸は、バッファメモリ３０４のデータ量を示す。グラフ７０３は画像出力部３０５において出力中のフレーム（フィールド）を示す。
【００７８】
グラフ７０１中の実線７０４はデコーダ部３０１からフレームメモリ３０２へのフレームデータの供給量を示している。破線７０５はフレームメモリ３０２から垂直フィルタ部３０３へのフレームデータの供給量を示している。
グラフ７０２中の破線７０６は垂直フィルタ部３０３からバッファメモリ３０４への１stフィールド縮小画像の供給量を示している。一点鎖線７０７は垂直フィルタ部３０３からバッファメモリ３０４への２ndフィールド縮小画像の供給量を示している。
【００７９】
またグラフ７０２中の実線７０８はバッファメモリ３０４から画像出力部３０５への１stフィールド縮小画像データの供給状態を示している。１／２縮小の場合、縮小画像の表示位置はフレームの上半分の位置から下半分の位置までとりうるため、同図の実線７０９は表示位置に応じてタイミングが異なっている。同様に、実線７０９はバッファメモリ３０４から画像出力部３０５への２ndフィールド縮小画像データの供給状態を示している。
【００８０】
グラフ７０１で示すように、デコーダ部３０１からフレームメモリ３０２へのnフレームのフレームデータの供給はn-１フレームの２ndフィールドのフレームメモリ３０２から垂直フィルタ部３０３への供給開始直後に開始し、フレームメモリ３０２から垂直フィルタ部３０３へのnフレームのフレームデータの供給はnフレームの１stフィールドのフレームメモリ３０２から垂直フィルタ部３０３への供給完了直前までに終了するように制御を行う。
【００８１】
グラフ７０２で示すように、垂直フィルタ部３０３からバッファメモリ３０４へのnフレームの１stフィールドのフレームデータの供給はn-１フレームの２ndフィールド表示中に、nフレームの２ndフィールドのフレームデータの供給はnフレームの１stフィールド表示中にそれぞれ完了するように制御を行う。
このように装置を制御する事により、デコーダ部３０１からフレームメモリ３０２間は、２Vの期間に１フレームのフレームデータを転送する能力があれば十分である。フレームメモリ３０２から垂直フィルタ部３０３間は、１Vの期間に１/２フレームのフレームデータを転送する能力があれば十分である。デコーダ部３０１は２Vの期間に１フレームのフレームデータを生成する演算能力、垂直フィルタ部３０３は１Vの期間に１/２フレームのフレームデータをフィルタ処理する演算能力があれば十分である。垂直フィルタ部３０３からバッファメモリ３０４間は、１Vの期間に１/４フレームのフレームデータを転送する能力があれば十分である。バッファメモリ３０４から画像出力部３０５間は、１Vの期間に１/４フレームのフレームデータを転送する能力があれば十分である。フレームメモリ３０２は、フレームデータ１フレームを保持し、バッファメモリ３０４は、フレームデータ１/２フレームを保持する容量があれば十分である。
【００８２】
次に、図２０と対比するため、図２１にバッファメモリ３０４を備えていない場合のデータ供給状態の時間変化を示す。
縮小処理を行わない場合、フレームメモリ３０２へのnフレームのデジタル画像データの供給は実線５０６で示すように、破線５０７で示すn-１フレームの２ndフィールドの垂直フィルタ部３０３への供給が始まった時から開始し、破線５０８で示すnフレームの１stフィールドの垂直フィルタ部３０３への供給が完了する前に終了する。そのため、図５のグラフ上で示す２Vの期間の間に１フレームのデジタル画像データを一定の速度で供給する。
【００８３】
また、nフレームの１stフィールドのフレームメモリ３０２から垂直フィルタ部３０３へのデジタル画像データの供給は破線５０８で示すように、実線５１１が示すフレームメモリ３０２へのnフレームのデジタル画像データの供給が終了する直後に完了し、続いて２ndフィールドの処理を開始する。そのため、フレームメモリ３０２から垂直フィルタ部３０３へのデジタル画像データの供給は、図２１のグラフ上で示す１Vの期間の間に１フィールドのデジタル画像データを一定の速度で供給される。
【００８４】
ところが、１/２縮小処理を行う場合、フレームメモリ３０２へのnフレームのデジタル画像データの供給開始が可能となるタイミングは、n-１フレームの２ndフィールドの表示位置によって異なってくる。n-１フレームの２ndフィールドの表示位置によって、フレームメモリ３０２から垂直フィルタ部３０３へのデジタル画像データの供給は破線５０９から５１０の間のどこかで行われ、フレームメモリ３０２へのnフレームのデジタル画像データの供給開始が可能となるタイミングが時間的に最も遅れるのは、破線５１０で示す表示位置の場合である。この場合、１/２縮小画像は画像出力部５０１の下半分に出力される。また、フレームメモリ３０２へのnフレームのデジタル画像データの供給は、破線５１１で示すnフレームの１stフィールドの垂直フィルタ部３０３への供給が完了する前にに終了していなければならない。そのため、図２１のグラフ上で示す１Vの期間の間に１フレームのデジタル画像データを一定の速度で供給する必要があり、縮小を行わない場合に比べ２倍の供給能力が必要となる。
【００８５】
また、nフレームの１stフィールドのフレームメモリ３０２から垂直フィルタ部３０３へのデジタル画像データの供給は破線５１１で示すように、実線５１２が示すフレームメモリ３０２へのnフレームのデジタル画像データの供給が終了する直後に完了し、続いて２ndフィールドの処理を開始する。そのため、図５のグラフ上で示す１/２Vの期間の間に１フィールドのデジタル画像データを一定の速度で供給する必要があり、縮小を行わない場合に比べ２倍の供給能力が必要となる。垂直フィルタ部３０３も、供給されるデジタル画像データに見合った性能が要求されるため、縮小を行わない場合に比べ２倍の演算能力が必要となる。
【００８６】
また、図２３は、図２０と対比するため、バッファメモリ３０４を備えていない場合であって１/４縮小処理を行う場合のデータ供給状態の時間変化を示す。
１/４縮小処理を行う場合のグラフを図２３に示す。上記と同様の理由より、フレームメモリ３０２へのデジタル画像データの供給能力、フレームメモリ３０２から垂直フィルタ部３０３への供給能力、垂直フィルタ部の演算能力はそれぞれ縮小処理を行わない場合の４倍が必要となる。このように、バッファメモリ３０４を備えない場合は、縮小率が上がると必要なピーク性能も大きくなってしまう。
＜３．１．２１／４縮小＞
図２２は、図１９示したメディアプロセッサにて１/４縮小を行う場合の各部のデータ供給状態とその時間変化を示す図である。
【００８７】
図２２において、グラフの横軸、縦軸は図２０と同様である。
グラフ上の実線８０４はデコーダ部３０１からフレームメモリ３０２へのフレームデータの供給状態を示している。グラフ上の破線８０５はフレームメモリ３０２から垂直フィルタ部３０３へのフレームデータの供給状態を示している。グラフ上の破線８０６は垂直フィルタ部３０３からバッファメモリ３０４への１stフィールド縮小画像データの供給状態を示している。グラフ上の破線８０７は垂直フィルタ部３０３からバッファメモリ３０４への２ndフィールド縮小画像データの供給状態を示している。グラフ上の実線８０８はバッファメモリ３０４から画像出力部３０５への１stフィールド縮小画像データの供給状態を示している。グラフ上の実線８０９はバッファメモリ３０４から画像出力部３０５への２ndフィールド縮小画像データの供給状態を示している。
【００８８】
同図に示すように、デコーダ部３０１からフレームメモリ３０２間は、２Vの期間に１フレームのフレームデータを転送する能力があれば十分である。フレームメモリ３０２から垂直フィルタ部３０３間は、１Vの期間に１/２フレームのフレームデータ転送する能力があれば十分である。デコーダ部３０１は２Vの期間に１フレームのフレームデータを生成する演算能力があれば十分である。垂直フィルタ部３０３は１Vの期間に１/２フレームのフレームデータをフィルタ処理する演算能力、垂直フィルタ部３０３からバッファメモリ３０４間は、１Vの期間に１/８フレームのフレームデータ転送能力、バッファメモリ３０４から画像出力部３０５間は、１Vの期間に１/８フレームのフレームデータ転送能力があれば十分である。フレームデータ１フレームを保持できるフレームメモリ３０２、フレームデータ１/４フレームを保持できるバッファメモリ３０４がそれぞれ必要となる。
【００８９】
これらの各必要性能は最短でも１Vの期間での平均の能力であり、縮小率が大きくなっても短い期間に大きなピーク性能を要求される事がない。また、最も処理性能を必要とされるのが縮小なしの場合である。この場合、デコーダ部３０１からフレームメモリ３０２間は、２Vの期間に１フレームのフレームデータ転送能力で足りる。フレームメモリ３０２から垂直フィルタ部３０３間は、１Vの期間に１/２フレームのフレームデータ転送能力で足りる。デコーダ部３０１は２Vの期間に１フレームのフレームデータを生成する演算能力で足りる。垂直フィルタ部３０３は１Vの期間に１/２フレームのフレームデータをフィルタ処理する演算能力で足りる。垂直フィルタ部３０３からバッファメモリ３０４間は、１Vの期間に１/２フレームのフレームデータ転送能力で足りる。バッファメモリ３０４から画像出力部３０５間は、１Vの期間に１/２フレームのフレームデータ転送能力で足りる。フレームメモリ３０２はフレームデータ１フレームを保持でき、バッファメモリ３０４は、フレームデータ１フレームを保持できればよい。この能力であらゆる垂直縮小処理を行う事ができる。これらにより回路規模を削減し、動作クロックを引き下げる事が出来る。
＜３．２垂直フィルタ処理（その２）＞
図２４は、メディアプロセッサにおいて垂直フィルタ処理を行う場合のデータの流れを示した模式的なブロック図である。
【００９０】
同図は、デコード部４０１、バッファメモリ４０２、垂直フィルタ部４０３、バッファメモリ４０４、映像出力部４０５、制御部４０６からなる。同図は、図１９と比べて、デコード部４０１、垂直フィルタ部４０３、バッファメモリ４０４、映像出力部４０５は、同名の構成要素と同様である。従って同じ点は説明を省略し、異なる点を中心に説明する。
【００９１】
バッファメモリ４０２は、1フレーム分の記憶容量より少ない容量でよい点でフレームメモリ３０２とは異なる。
垂直フィルタ部４０３は、垂直方向の６４ライン（処理前のフレーム中の４マクロブロックライン）のフィルタ処理を終える毎に制御部４０６にその旨（フィルタ状態）を通知する点で垂直フィルタ部３０３と異なる。なお、通知の単位はマクロブロックライン２〜３単位としてもよい。
【００９２】
デコード部４０１は、６４ライン単位のデコードを終える毎に制御部４０６にその旨（デコード状態）を通知する点でデコード部３０１と異なる。なお、通知の単位は１６ライン単位でもよい。
制御部４０６は、図２中のＩＯＰ２１１に相当し、デコード部４０１と垂直フィルタ部４０３の動作状態を、それぞれからの通知に基づいて監視し、垂直フィルタ処理がデコード処理を越さないように、かつデコード処理が垂直フィルタ処理を追い越さないようにデコード部４０１及び垂直フィルタ部４０３を制御する。つまり、制御部４０６の次の２つを制御する。１つは、フィルタ処理の対象となるマクロブロックラインの画素データ群をデコード部４０１がバッファメモリ４０２に書き込んでいないのに、垂直フィルタ部４０３が前のフレーム（又はフィールド）のマクロブロックラインの画素データ群を対象にフィルタ処理を行うことを防止することである。もう１つは、垂直フィルタ部４０３が垂直フィルタ処理の対象だが未処理のマクロブロックラインに対して、デコード部４０１が次のフレームの画素データ群を上書きしてしまうことを防止することである。
【００９３】
図２５は、制御部４０６における制御内容を示す説明図である。
同図の横軸は時間であり、制御部４０６、VSYNC（垂直同期信号）、デコード部４０１、垂直フィルタ部４０３、映像出力部４０５の各動作を記してある。
同図のようにデコード部４０１は６４ラインのデコードを終える毎にその旨を制御部４０６に通知し、垂直フィルタ部４０３は６４ラインのフィルタしょりを終える毎にその旨を制御部４０６に通知する。制御部４０６は、これらの通知を下に、デコードが完了したライン番号Ndと、フィルタ処理が完了したライン番号Nfとを保持及び更新し、Nd（現フレーム）＞Nf（現フレーム）、Nd（次のフレーム）＜Nｆ（現フレーム）を満たすよう、デコード部４０１、垂直フィルタ部４０３を制御する。具体的には、制御部４０６は、NdとNfが接近した場合（その差がしきい値以下になった場合）にはデコード部４０１、垂直フィルタ部４０３の一方を一時的に停止させる。なお、Nd、Nfはマクロブロックラインの番号であってもよい。
【００９４】
また、NdとNfが接近した場合には、制御部４０６の制御によって、デコード部４０１、垂直フィルタ部４０３の一方は制御部４０６により一時的に停止されるが、NdとNfが接近したか否かの判定及びデコード部４０１又は垂直フィルタ部４０３を一時的に停止させる制御は、制御部４０６以外が担当するように構成してもよい。
【００９５】
たとえば、垂直フィルタ部４０３がデコード部４０１に上記フィルタ状態の通知を行うようにし、デコード部４０１は、フィルタ状態の通知と内部のデコード状態とに従って、NdとNfとが接近したか否かを判定し、判定結果に応じてデコード動作を一時的に停止し又は垂直フィルタ部４０３を一時的に停止させる構成としてもよい。
【００９６】
あるいは、逆に、デコード部４０１が垂直フィルタ部４０３に上記デコード状態の通知を行うようにし、垂直フィルタ部４０３は、デコード状態の通知と内部のフィルタ状態とに従って、NdとNfとが接近したか否かを判定し、判定結果に応じてフィルタ処理を一時的に停止し又はデコード部４０１を一時的に停止させる構成としてもよい。
＜３．２．１１／２縮小＞
図２６は図２４において１/２縮小処理を行う場合の各部の供給データ量を示す図である。
【００９７】
グラフ９０１の横軸はバッファメモリ４０２上のフレームデータ量を示し、縦軸は時間を示している。グラフ９０２の横軸はバッファメモリ４０４上のフレームデータ量を示し、縦軸は時間を示している。グラフ９０３は画像出力部４０５の状態を時系列上に並べたものであり、時間軸はグラフ９０１、９０２の縦軸と合っている。
【００９８】
グラフ上の実線９０４はデコーダ部４０１からバッファメモリ４０２へのフレームデータの供給状態を示している。グラフ上の破線９０５はバッファメモリ４０２から垂直フィルタ部４０３へのフレームデータの供給状態を示している。グラフ上の破線９０６は垂直フィルタ部４０３からバッファメモリ４０４への１stフィールド縮小画像データの供給状態を示している。グラフ上の破線９０７は垂直フィルタ部４０３からバッファメモリ４０４への２ndフィールド縮小画像データの供給状態を示している。グラフ上の実線９０８はバッファメモリ４０４から画像出力部４０５への１stフィールド縮小画像データの供給状態を示している。グラフ上の実線９０９はバッファメモリ４０４から画像出力部４０５への２ndフィールド縮小画像データの供給状態を示している。
【００９９】
グラフ９０１で示すように、デコーダ部４０１からバッファメモリ４０２へのnフレームのフレームデータの供給が開始された直後に、バッファメモリ４０２から垂直フィルタ部４０３へのnフレームのフレームデータの供給を開始し、デコーダ部４０１からバッファメモリ４０２へのnフレームのフレームデータの供給が終了する直後にバッファメモリ４０２から垂直フィルタ部４０３へのnフレームのフレームデータの供給が終了するように制御を行う。グラフ９０２で示すように、垂直フィルタ部４０３からバッファメモリ４０４へのnフレームのフレームデータの供給はn-１フレーム表示中に完了するように制御を行う。
【０１００】
このように装置を制御する事により、デコーダ部４０１からバッファメモリ４０２間は、２Vの期間に１フレームのフレームデータ転送能力、バッファメモリ４０２から垂直フィルタ部４０３間は、２Vの期間に１フレームのフレームデータ転送能力、デコーダ部４０１は２Vの期間に１フレームのフレームデータを生成する演算能力、垂直フィルタ部４０３は２Vの期間に１フレームのフレームデータをフィルタ処理する演算能力、垂直フィルタ部４０３からバッファメモリ４０４間は、２Vの期間に１/２フレームのフレームデータ転送能力、バッファメモリ４０４から画像出力部４０５間は、１Vの期間に１/４フレームのフレームデータ転送能力、数ライン分のフレームデータを保持できるバッファメモリ４０２、フレームデータ１フレームを保持できるバッファメモリ４０４がそれぞれ必要となる。
＜３．２．２１／４縮小＞
図２７は図２４において１/４縮小を行った場合の各部のデータ供給量を示す図である。
【０１０１】
グラフ１００１の横軸はバッファメモリ４０２上のフレームデータ量を示し、縦軸は時間を示している。グラフ１００２の横軸はバッファメモリ４０４上のフレームデータ量を示し、縦軸は時間を示している。グラフ１００３は画像出力部４０５の状態を時系列上に並べたものであり、時間軸はグラフ１００１、１００２の縦軸と合っている。
【０１０２】
グラフ上の実線１００４はデコーダ部４０１からバッファメモリ４０２へのフレームデータの供給状態を示している。グラフ上の破線１００５はバッファメモリ４０２から垂直フィルタ部４０３へのフレームデータの供給状態を示している。グラフ上の破線１００６は垂直フィルタ部４０３からバッファメモリ４０４への１stフィールド縮小画像データの供給状態を示している。グラフ上の破線１００７は垂直フィルタ部４０３からバッファメモリ４０４への２ndフィールド縮小画像データの供給状態を示している。グラフ上の実線１００８はバッファメモリ４０４から画像出力部４０５への１stフィールド縮小画像データの供給状態を示している。グラフ上の実線１００９はバッファメモリ４０４から画像出力部４０５への２ndフィールド縮小画像データの供給状態を示している。
【０１０３】
このように装置を制御する事により、デコーダ部４０１からバッファメモリ４０２間は、２Vの期間に１フレームのフレームデータ転送能力で足り、バッファメモリ４０２から垂直フィルタ部４０３間は、２Vの期間に１フレームのフレームデータ転送能力で足り、デコーダ部４０１は２Vの期間に１フレームのフレームデータを生成する演算能力で足り、垂直フィルタ部４０３は２Vの期間に１フレームのフレームデータをフィルタ処理する演算能力で足り、垂直フィルタ部４０３からバッファメモリ４０４間は、２Vの期間に１/４フレームのフレームデータ転送能力で足り、バッファメモリ４０４から画像出力部４０５間は、１Vの期間に１/８フレームのフレームデータ転送能力で足りる。バッファメモリ４０２は、数ライン分のフレームデータを保持でき、バッファメモリ４０４はフレームデータ１/２フレームを保持できれば足りる。
【０１０４】
これらの各必要性能は最短でも１Vの期間での平均の能力であり、縮小率が短い期間で大きなピーク性能を要求される事がない。
また、最も処理性能を必要とされるが縮小なしの場合であり、その場合に要求されるのがデコーダ部４０１からバッファメモリ４０２間は、２Vの期間に１フレームのフレームデータ転送能力、バッファメモリ４０２から垂直フィルタ部４０３間は、２Vの期間に１フレームのフレームデータ転送能力、デコーダ部４０１は２Vの期間に１フレームのフレームデータを生成する演算能力、垂直フィルタ部４０３は２Vの期間に１フレームのフレームデータをフィルタ処理する演算能力、垂直フィルタ部４０３からバッファメモリ４０４間は、２Vの期間に１フレームのフレームデータ転送能力、バッファメモリ４０４から画像出力部４０５間は、１Vの期間に１/２フレームのフレームデータ転送能力、数ライン分のフレームデータを保持できるバッファメモリ４０２、フレームデータ２フレームを保持できるバッファメモリ４０４であり、この能力であらゆる垂直縮小処理を行う事ができる。これらにより回路規模を削減し、動作クロックを引き下げる事が出来る。
＜４．変形例＞
図２８、２９は、画素並列処理部の左半分、右半分の第１の変形例を示す図である。これらの図は、図３、４に対して同じ構成要素には同じ符号を付しているので説明を省略し、異なる点を中心に説明する。
【０１０５】
図２８、２９は、図３、４の画素処理部１〜１６の代わりに画素処理部１ａ〜１６ａを、画素転送部１７、１８の代わりに画素転送部１７ａ、１８ｂを備える。画素処理部１ａ〜１６ａはいずれも同じ構成なので、画素処理部１ａを代表して説明する。
画素処理部１ａは、画素処理部１における選択部Ａ１０４、選択部Ｂ１０５の代わりに選択部Ａ１０４ａ、選択部Ｂ１０５ａを備える。
【０１０６】
選択部Ａ１０４ａは、選択部Ａ１０４と比べると２入力から３入力になっている点が異なる。つまり、選択部Ａ１０４ａは、２つ隣の画素転送部（又は画素処理部）の遅延器（遅延器Ｂ）から画素データ入力が増えている。
選択部Ｂ１０５ａは、同様に２つ隣の画素転送部（又は画素処理部）の遅延器（遅延器Ｂ）の画素データ入力が増えている。
【０１０７】
また、画素転送部１７ａは、選択部Ｂ１７０３〜選択部Ｇ１７０８の代わりに選択部Ｂ１７０３ａ〜選択部Ｇ１７０８ａを備える。選択部Ｂ１７０３ａ〜選択部Ｇ１７０８ａは、それぞれ２入力ではなく３入力になっている。増えている入力は、２つ左の遅延器からの画素データ入力である。
また、画素転送部１８ａは、選択部Ｂ１８０３〜選択部Ｇ１８０８の代わりに選択部Ｂ１８０３ａ〜選択部Ｇ１８０８ａを備える。選択部Ｂ１８０３ａ〜選択部Ｇ１８０８ａは、それぞれ２入力ではなく３入力になっている。増えている入力は、２つ右の遅延器からの画素データ入力である。
【０１０８】
この構成によれば、処理対象の画素と、その画素から左右に２つ隣の画素を順に用いたフィルタ処理を行うことができる。
例えば、画素処理部１ａでは次式などを演算することができる。
a0・X9+a1(X11+X7)+a2(X13+X5)+a3(X15+X3)
図３０、３１は、画素並列処理部の左半分、右半分の第２の変形例を示す図である。
【０１０９】
図３０、３１は、図３、４の画素処理部１と画素処理部１６との代わりに画素処理部１ｂと画素処理部１６ｂとを備える。
画素処理部１ｂは、画素処理部１における選択部Ｂ１０５の代わりに選択部ｂ１０５ｂを備える。選択部Ｂ１０５ｂは、遅延器Ｂ１０７からのフィードバック入力を有している点で選択部Ｂ１０５と異なっている。
【０１１０】
画素処理部１６ｂは、画素処理部１６における選択部Ａ１６０４の代わりに選択部Ａ１６０４ｂを備える。選択部Ａ１６０４ｂは、遅延器Ａ１６０６からのフィードバック入力を有している点で選択部Ａ１６０５と異なっている。
この構成によれば、画素処理部１ｂは例えば次の演算を行う。
a3*X6＋a2*X7＋a1*X8＋a0*X9＋a1*X10＋a2*X11＋a3*X12
このとき画素処理部２の出力は、次のようになる。
【０１１１】
a3*X20＋a2*X21＋a1*X22＋a0*X23＋a1*X24＋a2*X24＋a3*X24
このとき、画素処理部１６ｂの出力は次のようになる。
a3*X21＋a2*X22＋a1*X23＋a0*X24＋a1*X24＋a2*X24＋a3*X24
このように、図３０、３１では、左端の画素処理部１ｂにデータ列の左端の画素データが転送されてきた場合に、選択部Ｂ１０５ｂは画素処理部１ｂ内の遅延器Ｂからのフィードバック入力を選択する。右端の画素処理部１６ｂにデータ列の右端の画素データが転送されてきた場合、選択部Ａ１６０４ｂは、遅延器Ａ１６０６からのフィードバック入力を選択する。
【０１１２】
図３２、３３は、画素並列処理部の左半分、右半分の第２の変形例を示す図である。
図３２、３３は、図３、４の画素処理部１〜１６の代わりに画素処理部１ｃ〜１６ｃ、画素転送部１７、１８の代わりに画素転送部１７ｃ、１８ｃを備える。画素処理部１ｃ〜１６ｃはいずれも同じ構成なので、画素処理部１aを代表して説明する。
【０１１３】
画素処理部１ｃは、画素処理部１における選択部Ａ１０４、選択部Ｂ１０５の代わりに選択部Ａ１０４ｃ、選択部Ｂ１０５ｃを備える。
選択部Ａ１０４ｃは、選択部Ａ１０４と比べると２入力から３入力になっている点が異なる。つまり、選択部Ａ１０４ｃは、２つ隣の画素転送部（又は画素処理部）の遅延器（遅延器Ｂ）の画素データ入力が増えている。
【０１１４】
選択部Ｂ１０５ｃは、２つ隣の画素転送部（又は画素処理部）の遅延器（遅延器Ｂ）の画素データ入力と、遅延器Ｂ１０７からのフィードバック入力とが増えている。
画素転送部１７ｃ、１８ｃは、図２８、図２９に示した画素転送部１７ａ、１８ａと同様に２入力ではなく３入力になっている。
【０１１５】
この構成によれば、画素処理部１ｃは例えば次の演算を行う。
a3*X9＋a2*X9＋a1*X9＋a0*X9＋a1*X11＋a2*X13＋a3*X15
このとき画素処理部２ｃの出力は、次のようになる。
a3*X10＋a2*X10＋a1*X10＋a0*X10＋a1*X12＋a2*X14＋a3*X16
このとき画素処理部１５ｃの出力は、次のようになる。
a3*X17＋a2*X19＋a1*X21＋a0*X23＋a1*X23＋a2*X23＋a3*X23
このとき画素処理部１６ｃの出力は、次のようになる。
a3*X18＋a2*X20＋a1*X22＋a0*X24＋a1*X24＋a2*X24＋a3*X24
図３４は、ＰＯＵＡ２０７の変形例を示す図である。
【０１１６】
同図のＰＯＵＡ２０７は、図２と比べてアップサンプリング回路２２ａとダウンサンプリング回路２３ａとが追加されている。図２と同じ点は説明を省略し、異なる点を中心に説明する。
アップサンプリング回路２２ａは、入力バッファ群２２から入力される画素データ群を垂直方向に拡大する。例えば、入力バッファ群２２から入力される画素データ群を垂直方向に２倍にするよう画素データを補間するため、入力バッファ群２２からの画素データ群の入力１回に対して、同じ画素データ群を２回画素並列処理部２１に出力する。
【０１１７】
ダウンサンプリング回路２３ａは、画素並列処理部２１から入力される画素データ群を垂直方向に縮小する。例えば、画素並列処理部２１から入力される画素データ群を垂直方向に１／２倍にするよう画素データを間引く。つまり、画素並列処理部２１からの画素データ群の入力２回に対して、１回分を破棄し１回分を出力する。
【０１１８】
この構成によれば、画素並列処理部２１の入力側で垂直方向に２倍、出力側で垂直方向に１／２倍するので、外部メモリ２２０における１フレームあたりのデータ量を垂直方向に１／２にすることができ、その結果、ＰＯＵＣ２０９によるＰＯＵＡ２０７へのデータ転送量を１／２にすることができる。これにより、デュアルポートメモリ１００の内部ポートへのアクセスが集中する場合にバスネックを解消することができる。
【０１１９】
また、本発明の画素演算装置は画像のをリサイズ等を行うのフィルタリング処理を複数画素に対して並列に行うので、動画の圧縮処理／伸張処理、リサイズ等を扱うメディアプロセッサなどのデジタル映像機器に利用される。
【０１２０】
【発明の効果】
本発明の画素演算装置は、フィルタ処理を行う画素演算装置であって、Ｎ個の画素処理手段と、Ｎ個の画素データ及びフィルタ係数を供給する供給手段と、Ｎ個の画素処理手段を並列に動作させる制御手段とを備える。
各画素処理手段は、供給手段に供給された画素データとフィルタ係数とを用いて演算した後、各画素処理手段に対して隣接する画素処理手段から画素データを取得し、取得した画素データを用いて演算して演算結果を累積する。前記制御手段は、隣接する画素処理手段からの画素データの取得と、取得した画素データを用いた演算及び累積とをタップ数に応じた回数繰り返すようＮ個の画素処理手段を制御する。
【０１２１】
ここで、前記Ｎ個の画素処理手段は、Ｎ個の画素データを右シフトする第１シフタと、Ｎ個の画素データを左シフトする第２シフタを形成する。各画素処理手段は、隣接する２つの画素処理手段からシフトアウトされる２つの画素データを用いて演算する。
この構成によれば、タップ数を可変にすることができ、周波数をあげずに処理を高速化するフィルタリング処理を行うことができるという効果がある。
【０１２２】
また、本発明の画素演算装置は、画素データとして差分画像の画素データと参照フレームの画素データとを供給手段から供給する。
この構成によれば、フィルタリング処理だけでなくＭＣ（動き補償）処理にも利用可能で、フィルタ装置とＭＣ回路とを独立に設ける必要がないので、回路規模の小型化を図ることができるという効果がある。
【図面の簡単な説明】
【図１】従来技術におけるＦＩＲフィルタ処理を行う回路例を示すブロック図である。
【図２】画素演算ユニットを備えるメディアプロセッサの構成を示すブロック図である。
【図３】画素演算ユニット（ＰＯＵＡ、ＰＯＵＢ）の構成を示すブロック図である。
【図４】画素並列処理部の左半分の構成を示すブロック図である。
【図５】画素並列処理部の右半分の構成を示すブロック図である。
【図６】（ａ）は、入力バッファ群２２の詳細な構成を示すブロック図である。
（ｂ）は、入力バッファ群２２内の選択部の詳細な構成を示すブロック図である。
【図７】出力バッファ群２３の構成を示すブロック図である。
【図８】画素演算ユニットにてフィルタ処理を行う場合の画素データの初期入力値を示す図である。
【図９】画素処理部１に対する画素データの初期入力値を示す説明図である。
【図１０】画素処理部１でのフィルタ処理における演算過程を示す図である。
【図１１】画素処理部１でのフィルタ処理の演算内容を示す説明図である。
【図１２】画素演算ユニットにてＭＣ（動き補償）処理（Ｐピクチャ）を行う場合の入出力画素データを示す図である。
【図１３】ＭＣ処理における復号対象フレームと参照フレームとを示す説明図である。
【図１４】画素演算ユニットにてＭＣ処理（Ｂピクチャ）を行う場合の入出力画素データを示す図である。
【図１５】画素演算ユニットにてＯＳＤ（オンスクリーンディスプレイ）処理を行う場合の入出力画素データを示す図である。
【図１６】画素演算ユニットにおけるＯＳＤ（オンスクリーンディスプレイ）処理の説明図である。
【図１７】画素演算ユニットにてＭＥ（動き予測）処理を行う場合の入出力画素データを示す図である。
【図１８】画素演算ユニットにてＭＥ（動き予測）の説明図である。
【図１９】メディアプロセッサにおいて垂直フィルタ処理を行う場合のデータの流れを示した模式的なブロック図である。
【図２０】垂直１／２縮小を行う場合の説明図である。
【図２１】従来技術において垂直１／２縮小を行う場合の説明図である。
【図２２】垂直１／４縮小を行う場合の説明図である。
【図２３】従来技術において垂直１／４縮小を行う場合の説明図である。
【図２４】メディアプロセッサにおいて垂直フィルタ処理を行う場合のデータの流れを示した模式的な別のブロック図である。
【図２５】デコード処理と垂直フィルタ処理とのタイミングを示す説明図である。
【図２６】垂直１／２縮小を行う場合の説明図である。
【図２７】垂直１／４縮小を行う場合の説明図である。
【図２８】画素並列処理部の左半分の第１の変形例を示す図である。
【図２９】画素並列処理部の右半分の第１の変形例を示す図である。
【図３０】画素並列処理部の左半分の第２の変形例を示す図である。
【図３１】画素並列処理部の右半分の第２の変形例を示す図である。
【図３２】画素並列処理部の左半分の第３の変形例を示す図である。
【図３３】画素並列処理部の右半分の第３の変形例を示す図である。
【図３４】画素処理ユニットの変形例を示す図である。
【符号の説明】
１画素処理部
１〜１６画素処理部
１７画素転送部
１８画素転送部
２１画素並列処理部
２２入力バッファ群
２２ａアップサンプリング回路
２３出力バッファ群
２３ａダウンサンプリング回路
２３ａ〜２３ｈラッチ
２３ａ〜２３ｐラッチ
２３ｉ〜２３ｐラッチ
２４命令メモリ
２４ａ〜２４ｈセレクタ
２４ａ〜２４ｐセレクタ
２４ｉ〜２４ｐセレクタ
２５命令デコーダ
２６指示回路
２７ＤＤＡ回路
１００デュアルポートメモリ
１０４選択部Ａ
１０４ａ選択部Ａ
１０４ｃ選択部Ａ
１０５選択部Ｂ
１０５ａ選択部Ｂ
１０５ｂ選択部Ｂ
１０５ｃ選択部Ｂ
１０７遅延器Ｂ
１０９遅延器Ｄ
１２０加算器Ａ
２００メディアプロセッサ
２０１ストリームユニット
２０１入力ポートＡ
２０２Ｉ／Ｏバッファ
２０２入力ポートＢ
２０３セットアッププロセッサ
２０３入力ポートＣ
２０４ビットストリームＦＩＦＯ
２０５可変長符号復号部
２０６ＴＥ
２０７ＰＯＵＡ
２０８ＰＯＵＢ
２０９ＰＯＵＣ
２１０オーディオユニット
２１１ＩＯＰ
２１２ビデオバッファメモリ
２１３ビデオユニット
２１４ホストユニット
２１５ＲＥ
２１６フィルタ部
２１７セットアップメモリ
２１８専用ＬＳＩ
２２０外部メモリ
４０１デコーダ部
４０２バッファメモリ
４０３垂直フィルタ部
４０４バッファメモリ
４０５映像出力部
４０６制御部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a pixel arithmetic device including a filtering circuit for resizing an image.
[0002]
[Prior art]
In recent years, technological progress of digital video equipment has been remarkable, and so-called media processors that handle video compression / expansion processing, resizing, and the like have been put into practical use.
For image resizing, an FIR (Finite Impulse Response) filter is often used.
[0003]
FIG. 1 is a block diagram showing an example of a circuit that performs FIR filter processing in the prior art. This figure shows an FIR filter with 7 taps and symmetrical coefficients.
In the figure, data input in time series from the data input terminal 1001 is sequentially transferred to the delay units 1002, 1003, 1004, 1005, 1006, and 1007 in this order. When the filter coefficients are symmetric, that is, when the coefficients corresponding to the input of the data input terminal and the output of each delay unit (called a tap) are symmetric with respect to the center tap (the output of the delay device 1004), Instead of multiplying the tap data by the filter coefficient, the tap data having the same coefficient is added to each other and then multiplied by the coefficient.
[0004]
For example, the input data of the data input unit 1001 and the output data of the delay unit 1007 are added by the adder 1008, and the multiplier 1008 multiplies the addition result by the coefficient h0. The output of the delay unit 1002 and the output of the delay unit 1006 are added by the adder 1009, and the multiplier 1009 multiplies the addition result by the coefficient h1.
Output data of the multipliers 1011 to 1014 are added by an adder 1015. The output data of the adder 1015 is output from the data output terminal 1016 in time series as a filter processing result. The coefficients h0 to h3 are determined according to the image reduction ratio. For example, if the reduction ratio is 1/2, a reduced image can be obtained by thinning out the time series output data to 1/2.
[0005]
Also, the filter coefficients are selected symmetrically because the visual phase of the image is preferable because a linear phase (the phase characteristic becomes a straight line with respect to the frequency) is obtained.
[0006]
[Problems to be solved by the invention]
However, in the above conventional method, when filtering processing is performed on image data, pixel data constituting an image is sequentially input from the end due to the circuit configuration, so that one pixel data can be input in one clock. Therefore, it is necessary to increase the operating frequency in order to increase the processing speed. There is a problem that operation at a high operating frequency increases cost and power consumption.
[0007]
In addition, since the circuit differs depending on the number of taps in the conventional method, there is no degree of freedom. If a circuit is provided for each number of taps, enormous costs are required.
A first object of the present invention is to provide a pixel calculation device that can perform a filtering process that can change the number of taps and increase the processing speed without increasing the frequency.
[0008]
A second object of the present invention is to provide a pixel arithmetic device that can be used not only for filtering processing but also for MC (motion compensation) processing, and which has a reduced circuit scale.
A third object is to provide a pixel arithmetic device that can be used not only for filtering processing but also for ME (motion prediction) processing, and which has a reduced circuit scale.
[0009]
A fourth object is to provide a pixel arithmetic device that is not only used for filtering processing but also used for OSD (On Screen Display) processing in digital video equipment and has a reduced circuit scale.
[0010]
[Means for Solving the Problems]
A pixel arithmetic device that achieves the first object is a pixel arithmetic device that performs filter processing, and includes N pixel processing means, N pixel data and a supply means for supplying filter coefficients, and N pieces of pixel arithmetic means. Control means for operating the pixel processing means in parallel.
Each pixel processing means calculates using the pixel data supplied to the supplying means and the filter coefficient, then acquires pixel data from the pixel processing means adjacent to each pixel processing means, and uses the acquired pixel data And accumulate the calculation results. The control means controls the N pixel processing means so as to repeat the acquisition of the pixel data from the adjacent pixel processing means and the calculation and accumulation using the acquired pixel data for the number of taps.
[0011]
Here, the N pixel processing units form a first shifter that shifts N pixel data to the right and a second shifter that shifts N pixel data to the left. Each pixel processing means performs an operation using two pixel data shifted out from two adjacent pixel processing means.
The pixel calculation device that achieves the second object supplies pixel data of a difference image and pixel data of a reference frame as pixel data from a supply unit.
[0012]
DETAILED DESCRIPTION OF THE INVENTION
The pixel arithmetic unit of the present invention mainly includes (a) filter processing used for enlargement / reduction of an image, (b) motion compensation (hereinafter referred to as ME) processing, (c) OSD (On Screen Display) processing, ( d) It is configured to selectively execute a motion estimation (hereinafter referred to as ME) process. (A) Regarding the filter processing, the pixel operation unit makes the number of taps variable without fixing, and processes a plurality of pixels (for example, 16 pixels) that are continuous in the horizontal direction or the vertical direction in parallel. Further, the filtering process in the vertical direction is performed in synchronization with the decompression process of the compressed moving image data.
[0013]
Hereinafter, the pixel operation unit in the embodiment of the present invention will be described in the following order.
1 Media processor configuration
1.1 Configuration of pixel operation unit
1.2 Configuration of pixel parallel processing unit
2.1 Filter processing
2.2 MC (motion compensation) processing
2.3 OSD (On Screen Display) processing
2.4 ME (motion prediction) processing
3.1 Vertical filter processing (1)
3.1.1 1/2 reduction
3.1.2 1/4 reduction
3.2 Vertical filter processing (part 2)
3.2.1 1/2 reduction
3.2.2 1/4 reduction
4 Modifications
<1 Media processor configuration>
A case will be described below where the pixel arithmetic unit in the present embodiment is incorporated in a media processor that performs media processing (compression audio video data decompression processing, audio video data compression processing, etc.). The media processor is mounted on, for example, a set top box that receives digital TV broadcasts, a television receiver, a DVD recording / playback apparatus, and the like.
[0014]
FIG. 2 is a block diagram illustrating a configuration of a media processor including a pixel operation unit. In the figure, a media processor 200 includes a dual port memory 100, a stream unit 201, an input / output buffer (hereinafter abbreviated as I / O buffer) 202, a setup processor 203, a bit stream FIFO 204, a variable length code decoding unit (VLD) 205, a variable. A long code decoding unit 205, a transfer engine (hereinafter referred to as TE) 206, a pixel operation unit A (hereinafter referred to as POUA) 207, a pixel operation unit B (hereinafter referred to as POUB) 208, a POUC 209, an audio unit 210, an IOP 211, an input / output processor (hereinafter referred to as “input processor”). IOP) 211, video buffer memory 212, video unit 213, host unit 214, RE 215, and filter unit 216.
[0015]
The dual port memory 100 includes an input / output port (hereinafter referred to as an external port) for the external memory 220, an input / output (hereinafter referred to as an internal port) for the media processor 200, and a cache memory. Among them, an access request from a component (hereinafter referred to as a master device) that reads / writes data to / from the external memory 220 is received from the internal port, and the external memory 220 is accessed according to the received access request. At that time, the dual port memory 100 caches a part of the data of the external memory 220 in the internal cache memory. The external memory 220 is a memory such as SDRAM or RDRAM, and temporarily stores compressed moving image data, compressed audio data, decoded audio data, decoded moving image data, and the like.
[0016]
The stream unit 201 inputs stream data (a so-called MPEG stream) from the outside, separates the input stream data into a video elementary stream and an audio elementary stream, and writes them into the I / O buffer 202.
The I / O buffer 202 is a buffer memory that temporarily holds a video elementary stream, an audio elementary stream, and audio data (expanded audio data). The video elementary stream and the audio elementary stream are respectively stored in the I / O buffer 202 from the stream unit 201 and further stored in the external memory 220 via the dual port memory 100 under the control of the IOP 211. Audio data is stored in the I / O buffer 202 from the external memory 220 via the dual port memory 100 under the control of the IOP 211.
[0017]
The setup processor 203 performs decoding (decompression) of the audio elementary stream and header analysis of the macro block of the video elementary stream. The audio elementary stream and the video elementary stream are transferred from the external memory 220 to the bit stream FIFO 204 via the dual port memory 100 under the control of the IOP 211. The setup processor 203 reads and decodes the audio elementary stream from the bit stream FIFO 204 and stores the decoded audio data in the setup memory 217. Audio data in the setup memory 217 is transferred to the external memory 220 via the dual port memory 100 by the IOP 211. Also, the setup processor 203 reads the video elementary stream from the bit stream FIFO 204, analyzes the macroblock header, and notifies the VLD 205 of the analysis result.
[0018]
The bit stream FIFO 204 is a FIFO memory for supplying a video elementary stream to the variable length code decoding unit 205 and an audio elementary stream to the setup processor 203. The video elementary stream and the audio elementary stream are transferred from the external memory 220 to the bit stream FIFO 204 via the dual port memory 100 under the control of the IOP 211.
[0019]
The VLD 205 decodes a variable length code included in the video elementary stream supplied from the bit stream FIFO 204. This decoding result is a DCT coefficient group in units of macroblocks.
The TE 206 performs IQ (inverse quantization) processing and IDCT (inverse DCT) processing for each macroblock on the decoding result of the VLD 205. These processing results are macroblocks. One macro block is composed of four luminance blocks (Y1 to Y4) and two color difference blocks (Cb, Cr). One block is 8 × 8 pixels. However, for a P picture and a B picture, one block is output from the TE 206 as 8 × 8 difference values. The TE 206 stores the decoding result in the external memory 220 via the dual port memory 100.
[0020]
The POUA 207 selectively executes mainly (a) filter processing, (b) MC processing, (c) OSD processing, (d) motion estimation processing, and the like.
In the filtering process (a), the POUA 207 filters 16 pixel data included in the video data (frame data) stored in the external memory 220 in parallel, and thins out or interpolates the 16 pixels after filtering. To reduce or enlarge. The data after the reduced word is stored in the external memory 220 via the dual port memory 100 under the control of the POUC 209.
[0021]
In the MC processing of (b), the POUA 207 uses the IQ and IDCT processing results (that is, the difference value of the pixel data) for the P picture and B picture stored in the external memory 220 by the TE 206 and the pixel data in the reference frame. Add in 16 parallels. The 16 sets of difference values and pixel data are input to the POUA 207 by the POUC 209 in accordance with the motion vector detected by the macroblock header analysis in the setup processor 203.
[0022]
(C) In OSD processing, the POUA 207 inputs an OSD image (still image) stored in the external memory 220 or the like via the dual port memory 100 and overwrites the display frame data in the external memory 220. Here, the OSD image means a menu image displayed according to a user's remote control operation, a time display, a channel number display, or the like.
[0023]
The ME process of (d) is a search for a highly correlated rectangular area in a reference frame for a macroblock to be encoded in uncompressed frame data, and a correlation is detected from the macroblock to be encoded. This is a process for obtaining a motion vector indicating the highest rectangular area. The POUA 207 calculates 16 differences in parallel between the pixel of the macroblock to be encoded and the pixel of the rectangular area in the search area.
[0024]
The POUB 208 has the same configuration as the POUA 207, and dynamically shares the processes (a) to (d).
The POUC 209 controls the supply of pixel data groups to the POUA 207 and the POUB 208 and the transfer of processing results to the external memory 220.
The audio unit 210 outputs audio data stored in the I / O buffer 202.
[0025]
The IOP 211 controls data input / output (data transfer) in the media processor 200. There are the following types of data transfer. The first is to transfer the stream data stored in the I / O buffer 202 to the stream buffer area in the external memory 220 via the dual port memory 100. Secondly, the video elementary stream and the audio elementary stream stored in the external memory 220 are transferred to the bit stream FIFO 204 via the dual port memory 100. Thirdly, the audio data stored in the external memory 220 is transferred to the I / O buffer 202 via the dual port memory 100.
[0026]
The video unit 213 reads pixel data for a few lines from the video data (image frame) in the external memory 220, stores it in the video buffer memory 212, and converts the pixel data for the lines 2-3 into a video signal. Then, the data is output to a display device such as a television receiver connected to the outside.
A host unit (HOST) 214 receives an instruction from an external host microcomputer, and controls start / end of MPEG decoding, MPEG encoding, OSD processing, reduction / enlargement processing, and the like according to the instruction.
[0027]
A rendering engine (RE) 215 is a master device and performs rendering processing in computer graphics. Data input / output is performed when the dedicated LSI 218 is connected to the outside.
The filter 216 performs still image data enlargement / reduction processing. Data input / output is performed when the dedicated LSI 218 is connected to the outside.
[0028]
In the above description, the case where the media processor inputs stream data from the stream unit 201 and decodes (decompresses) it has been mainly described. However, the reverse flow occurs when encoding (compressing) uncompressed video data and audio data. It becomes. At that time, POUA 207 (or POUB 208) performs ME processing, TE 206 performs DCT processing and Q (quantization) processing, and VLD 205 performs variable length coding.
<1.1 Pixel Arithmetic Unit Configuration>
FIG. 3 is a block diagram illustrating a configuration of the pixel calculation unit.
[0029]
Since the POUA 207 and the POUB 208 have the same configuration, the POUA 207 will be described here.
As shown in the figure, the POUA 207 includes a pixel parallel processing unit 21, an input buffer group 22, an output buffer group 23, an instruction memory 24, an instruction decoder 25, an instruction circuit 26, and a DDA circuit 27.
[0030]
The pixel parallel processing unit 21 includes a pixel transfer unit 17, 16 pixel processing units 1 to 16, and a pixel transfer unit 18, and targets the plurality of pixels input from the input buffer group 22 (a). Filter processing, (b) MC processing, (c) OSD processing, and (d) ME) processing are performed and output to the output buffer group 23. Each processing of (a) to (d) is completed by repeating a macroblock unit, that is, 16 pixels 16 times (for 16 lines). The activation of each process is controlled by the POUC 209. Further, the pixel transfer unit 17 holds a plurality of pixels (eight pixels in this case) on the left side (or the upper side) of the 16 pixels in the filter processing, and right shifts every clock. The pixel transfer unit 18 holds a plurality of pixels (here, 8 pixels) on the right side (or lower side) of the 16 pixels in the filter processing, and shifts out to the left for each clock.
[0031]
The input buffer group 22 holds a plurality of pixels to be processed transferred from the dual port memory 100 under the control of the POUC 209, and further holds a filter coefficient in the filter processing.
The output buffer group 23 arbitrarily changes the arrangement of the processing results (16 processing results corresponding to 16 pixels) by the pixel parallel processing unit 21 and temporarily holds them. In the filter processing, the pixel arrangement is changed and held to perform pixel thinning (during reduction) or interpolation (during enlargement).
[0032]
The instruction memory 24 stores a microprogram for filter processing (filter μP), a microprogram for MC processing (MCμP), a microprogram for OSD processing (OSDμP), and a microprogram for ME processing (MEμP). . In addition to this, the instruction memory 24 stores a microprogram for macroblock format conversion, a microprogram for converting the numerical expression of pixels, and the like. Here, the format of the macroblock is a pixel of Y, Cb, Cr blocks such as “4: 2: 0”, “4: 2: 2”, “4: 4: 4”, etc. defined in the MPEG standard. Sampling rate ratio. The numerical representation of the pixel includes a case where the pixel can be represented by 0 to 255 (general MPEG data or the like) and a case of -128 to 127 (DV camera or the like).
[0033]
The instruction decoder 25 sequentially reads and decodes the microcode in the microprogram from the instruction memory 24, and controls each part in the POUA 207 according to the decoding result.
The instruction circuit 26 receives an instruction (start address or the like) from the POUC 209 about which microprogram in the instruction memory 24 should be activated, and activates the designated microprogram.
[0034]
The DDA circuit 27 performs selection control of the filter coefficient group held in the input buffer group 22 in the filter processing.
<1.2 Configuration of Pixel Parallel Processing Unit>
4 and 5 are block diagrams showing detailed configurations of the left half and the right half of the pixel parallel processing unit.
[0035]
In FIG. 4, a pixel transfer unit 17 includes eight input ports A1701 to H1708, eight delay units A1701 to delayer H1709 that hold pixel data and delay one clock time, input port pixel data, and left delayer output. It is composed of seven selection units A1717 to G1723 for selecting one of them, and 8 pixels input in parallel from the input buffer group 22 are held in 8 delay units, and the pixels held in the 8 delay units are clock-synchronized. Functions as a right shifter that shifts to the right.
[0036]
In FIG. 5, the pixel transfer unit 18 is different from the pixel transfer unit 17 in that the shift direction is to the left.
Since the 16 pixel processing units 1 to 16 in FIGS. 4 and 5 have the same configuration, the pixel processing unit 2 will be described as a representative.
The pixel processing unit 2 includes an input port A201 to an input port C203, selection units A204 and B205, delay units A206 to D209, an adder A120, a multiplier A211, an adder B212, and an output port D213.
[0037]
The selection unit A204 selects one of the pixel data input from the input port A201 and the pixel data output from the pixel transfer unit 17 on the left side.
The selection unit A204 and the delay unit A206 also function to shift pixel data input from the right adjacent pixel processing unit 3 to the left adjacent pixel processing unit 1.
The selection unit B205 selects one of the pixel data input from the input port B202 and the pixel data shifted out from the right adjacent external memory 220.
[0038]
The selection unit B205 and the delay unit B207 also perform a function of shifting and outputting the pixel data input from the left adjacent pixel processing unit 1 to the right adjacent pixel processing unit 3.
The delay unit A206 and the delay unit B207 hold the selected pixel data in the selection unit A204 and the selection unit B205, respectively.
The delay device B207 holds the pixel data from the input port C203.
The adder A120 adds the pixel data output from the delay device A206 and the delay device B207.
[0039]
Multiplier A211 multiplies the addition result of adder A120 and the pixel data from delayer C208. The multiplier A211 is used for multiplication of pixel data and a filter coefficient in the filter processing.
The adder B212 adds the multiplication result of the multiplier A211 to the data of the delay unit D209.
[0040]
The delay unit D109 accumulates the addition result of the adder B212.
The pixel processing unit 2 executes the above (a) filter processing, (b) MC processing, (c) OSD processing, and (d) ME processing by selectively operating these components. The operation of selectively combining these components is performed by microprogram control by the instruction memory 24 and the instruction decoder 25.
[0041]
FIG. 6A is a block diagram showing a detailed configuration of the input buffer group 22.
As shown in the figure, the input buffer group 22 includes eight latches 221 that supply pixel data to the pixel transfer unit 17, 16 latch units 222 that supply pixel data to the pixel processing units 1 to 16, and pixel transfer. 8 latches 223 for supplying pixel data to the unit 18. In these, pixel data groups are transferred from the external memory 220 via the dual port memory 100 under the control of the POUC 209.
[0042]
Each latch unit 222 includes two latches that supply pixel data to the input ports A and B of the pixel processing unit, and a selection unit 224 that supplies pixel data or a filter coefficient to the input port C of the pixel processing unit.
FIG. 6B is a block diagram illustrating a detailed configuration of the selection unit 224.
As shown in the figure, the selection unit 224 includes eight latches 224a to 224h and a selector 224i that selects any one of data from the eight latches.
[0043]
The latches 224a to 224h hold the filter coefficients a0 to a7 (or a0 / 2, a1 to a7) in the filtering process. These filter coefficients are transferred from the external memory 220 to the latches 224a to 224h by the POUC 209 via the dual port memory 100.
The selector 224 i is sequentially selected from the latches 224 a to 224 h in synchronization with the clock under the control of the DDA circuit 27. In this way, the supply of the filter coefficient to the pixel processing unit is not directly controlled by the microcode, but is controlled by the hardware by the DDA circuit 27, so that the speed is increased.
[0044]
FIG. 7 is a block diagram showing the configuration of the output buffer group 23.
As shown in the figure, the output buffer group 23 includes 16 selectors 24a to 24p and 16 latches 23a to 23p.
Each of the selectors 24a to 24p receives 16 processing results of the pixel processing units 1 to 16, and selects one of them. This selection control is performed by the instruction decoder 25.
[0045]
The latches 23a to 23p hold the selection results of the selectors 24a to 24p, respectively.
For example, when the filter processing result is reduced to ½, the pixel processing units 1, 3, 5,... 15 out of the 16 processing results of the pixel processing units 1 to 16 for 16 pixels. Are selected by the eight selectors 24a to 24h and stored in the latches 23a to 23h. Further, out of the 16 processing results of the pixel processing units 1 to 16 for the next 16 pixels, the pixel processing unit The processing results of 2, 4, 6,... 16 are selected by the eight selectors 24i to 24p and stored in the latches 23i to 23p. In this way, the pixel data is thinned out and 16 pixel data reduced by 1/2 is held in the output buffer group 23 and further transferred to the external memory 220 via the dual port memory 100 under the control of the POUC 209.
<2.1 Filter processing>
Details of the filter processing in the pixel operation unit will be described.
[0046]
The POUC 209 identifies a macroblock to be filtered, transfers 32 pixel data and filter coefficients a0 / 2, a1 to a7 to the input buffer group 22 as initial values for the POUA 207 or POUB 208, and further indicates an instruction circuit. 26 is instructed to start the filtering process together with the notification of the number of taps.
FIG. 8 is a diagram illustrating initial input values of pixel data when the pixel processing unit (POUA 207) performs filter processing. In the figure, the input port column means each input port shown in FIGS. The input pixel column means pixel data supplied from the input buffer group 22 to each input port. The output port column means the output port D (adder B output) shown in FIGS. 4 and 5, and the output pixel column means the output value.
[0047]
In the input buffer group 22 for supplying pixel data to the input port, 32 pieces of pixel data X1 to X32 continuous in the horizontal direction are transferred and held by the POUC 209 as shown in FIG. The object of the filtering process here is 16 pixel data of X9 to X24. As shown in FIG. 8, the pixel data X9 to X24 are supplied to the input ports A and B of the pixel processing units 1 to 16, and the filter coefficient a0 / 2 selected by the input buffer group 22 is supplied to the input port C as an initial value. Is done.
[0048]
Further, after the initial input value is supplied from the input buffer group 22 to the pixel parallel processing unit 21, the filtering process is performed by the number of clock inputs corresponding to the desired number of taps as the filtering process.
FIG. 10 is an explanatory diagram showing a calculation process of the pixel processing unit 1 as a representative of the 16 pixel processing units. In the figure, the contents held by the delay units A to D in the pixel processing unit 1 and the output value of the adder B are shown for each number of input clocks. FIG. 11 is a diagram illustrating an output value of the output port D (adder B output) for each clock input of the pixel processing unit 1.
In the pixel processing unit 1, delay devices A and B hold pixel data X 9 as an initial input value by the first clock input (CLK 1), the delay device D holds the filter coefficient a 0/2, and the delay device D is cleared to zero. At this time, the selection units A and B both select the input port. As a result, the adder A outputs (X9 + X9), the multiplier A outputs (X9 + X9) * a0 / 2, and the adder B outputs (X9 * a0 / 2 + 0 (that is, a0 * X9) ( (See FIG. 11).
[0049]
After the second clock input (CLK2), the selection units A and B select the shift output from the adjacent pixel processing unit or pixel transfer unit instead of the input ports A and B.
The pixel data X10 and X8 and the filter coefficients a1 and a0 * X9 are held in the delay devices A to D by the second clock input (CLK2). As a result, the adder B outputs a0 * X9 + a1 (X10 + X8) (see FIG. 11). In this way, the second time, the multiplier A multiplies the filter coefficient a1 (delay device C) and the sum of pixel data shifted out from both sides (adder A). The adder B adds the multiplication result and the accumulated value of the delay unit D.
[0050]
At the third clock input (CLK3), the pixel processing unit 1 operates in the same manner as the second clock input, thereby adding a0 * X9 + a1 (X10 + x8) + a2 (X11 + X7) from the adder B. Is output.
By performing the same operation at the fourth to ninth clock inputs (CLK4 to CLK9), the adder B outputs the output values shown in FIG.
[0051]
In this way, when the filer processing result (output data) of the pixel processing unit 1 is 9 clocks,
a0 ・ X9 + a1 (X10 + X8) + a2 (X11 + X7) + a3 (X12 + X6)
+ a4 (X13 + X5) + a5 (X14 + X4) + a6 (X15 + X3) + a7 (X16 + X2) + a8 (X17 + X1)
It becomes.
[0052]
10 and 11 show the processing steps up to CLK9, but the number of input clocks is terminated by the control of the instruction decoder 25 in accordance with the number of taps notified from the POUC 209. That is, each pixel processing unit ends the filtering process at CLK2 when the number of taps is 3, ends at CLK3 when the number of taps is 5, and ends the filtering process at CLK4 when the number of taps is 7. In other words, the filter processing with the number of taps (2n-1) ends with n clock inputs.
[0053]
The instruction decoder 25 repeats 16 pixel parallel processing for 16 lines, thereby completing the filter processing for 4 blocks. At this time, the 16 filter processing results are reduced or enlarged by being thinned or interpolated in the output buffer group 23. Every time 16 pixel groups after reduction or enlargement of the output buffer group 23 are held, they are transferred to the external memory 220 via the dual port memory 100 under the control of the POUC 209. Further, the instruction decoder 25 notifies the POUC 209 of the end when the 16th line ends. The POUC 209 instructs the POUA 207 to supply the initial input value and the filter coefficient and start the filter process in the same manner as described above for the next macroblock.
[0054]
Note that the filter processing result of the pixel processing unit 2 is represented by the following expression when 9 clocks are used.
a0 ・ X10 + a1 (X11 + X9) + a2 (X12 + X8) + a3 (X13 + X7)
+ a4 (X14 + X6) + a5 (X15 + X5) + a6 (X16 + X4) + a7 (X17 + X3) + a8 (X18 + X2)
The filter processing result of the pixel processing unit 3 is expressed by the following equation when 9 clocks are used.
a0 ・ X11 + a1 (X12 + X10) + a2 (X13 + X9) + a3 (X14 + X8)
+ a4 (X15 + X7) + a5 (X16 + X6) + a6 (X17 + X5) + a7 (X18 + X4) + a8 (X19 + X3)
The filter processing results of the pixel processing units 4 to 16 are the same except that the pixel positions are different.
[0055]
As described above, the pixel parallel processing unit 21 can perform the filter processing on the 16 input pixels in parallel, and can arbitrarily set the number of taps by controlling the number of input clocks.
In FIG. 8, the input pixels of the input ports A, B, and C of the pixel processing unit 1 are (X9, X9, a0 / 2), but (X9, 0, a0) or (0, X9, a0) It is good. The pixel processing units 2 to 16 may be the same except that the target pixels are different.
<2.2 MC (Motion Compensation) Processing>
Details of MC processing when the decoding target frame is a P picture will be described.
[0056]
The POUC 209 instructs the instruction circuit 26 to start MC processing, specifies a macro block (difference value) in the decoding target frame to be MC processed, and a rectangular area pointed to by the motion vector in the reference frame, and the POUA 207 Alternatively, 16 difference values D 1 to D 16 and 16 pixel data P 1 to P 16 in the rectangular area are set in the input buffer group 22 for the POUB 208.
[0057]
FIG. 12 is a diagram showing input / output pixel data when MC processing (P picture) is performed in the pixel calculation unit. In the figure, the input port column means the input ports of the pixel transfer unit 17, the pixel processing units 1 to 16, and the pixel transfer unit 18 shown in FIGS. 4 and 5. The input pixel column means pixel data input to the input port. Since the pixel transfer units 17 and 18 are not used in the MC process, the input pixel may be anything (don't care). The output port column means the output port D (adder B output) shown in FIGS. 4 and 5, and the output pixel column means the output value.
[0058]
FIG. 13 is an explanatory diagram of input pixels to the pixel processing units 1 to 16 in the MC processing. As shown in the figure, D1 to D16 are 16 difference values in the macroblock (MB) of the decoding target frame. P1 to P16 are 16 pieces of pixel data in the rectangular area indicated by the motion vector in the reference frame.
In the MC processing, the selection units A and B in the pixel processing units 1 to 16 always select the input ports A and B, respectively. Thereby, the pixel data from the input port A and the difference value from the input port B are input to the delay units A and B via the selection units A and B, held, and further added by the adder A. The addition result is multiplied by 1 by the multiplier, 0 is added by the adder B, and the result is output from the output port D. That is, the pixel data from the input port A and the difference value from the input port B are simply added and output from the output port D.
[0059]
Further, the 16 addition results are stored in the output buffer group 23 and written back to the decoding target frame in the external memory 220 via the dual port memory 100 by the POUC 209.
MC processing is performed by repeating the above processing in units of 16 pixels of the decoding target frame. Each pixel processing unit simply performs addition, and an addition result of 16 pixels can be obtained for each clock.
[0060]
Next, MC processing when the decoding target frame is a B picture will be described.
FIG. 14 is a diagram showing input / output pixel data when MC processing (B picture) is performed in the pixel calculation unit. In the figure, the input port column, input pixel column, output port column, and output pixel column are the same as in FIG. However, the input pixel column is different from FIG. 12 in that the first clock (CLK1) and the second clock (CLK2) are input in two steps.
[0061]
P1 to P16 and B1 to B16 are 16 pieces of pixel data in a rectangular area indicated by a motion vector in two different reference frames.
In the MC processing, the selection units A and B in the pixel processing units 1 to 16 always select the input ports A and B, respectively. In the first clock (CLK1), P1 and B1 are held in the delay units A and B from the input ports A and B via the selection units A and B, and are simultaneously held in the constant 1/2 delay unit C from the input port C. . As a result, (P1 + B1) / 2 is obtained from the multiplier A. In the second clock (CLK2), the multiplication result (P1 + B1) / 2 is held in the delay device D, and at the same time, (1,0, D1) from the input ports A, B, C are the delay devices A, B, C. Therefore, D1 from the multiplier A and (P1 + B1) / 2 from the delay unit D are added by the adder B. As a result, (P1 + B1) / 2 + D1 is output from the output port.
[0062]
Further, the 16 addition results are stored in the output buffer group 23 and written back to the decoding target frame in the external memory 220 via the dual port memory 100 by the POUC 209.
MC processing for the B picture is performed by repeating the above processing in units of 16 pixels of the decoding target frame.
<2.3 OSD (On Screen Display) Processing>
The POUC 209 instructs the instruction circuit 26 to start OSD processing, and sequentially reads 16 pixel data X1 to X16 from the OSD image held in the external memory 220 and sets them in the input buffer group 22.
[0063]
FIG. 15 is a diagram showing input / output pixel data when OSD (on-screen display) processing is performed in the pixel arithmetic unit.
In the figure, the pixel transfer units 17 and 18 are not used. Pixel data X1 to X16 are input to the input port A of the pixel processing units 1 to 16 from the input buffer group 22, 0 is input to the input port B, and 1 is input to the input port C, respectively. FIG. 16 shows a state in which 16 pixels in the OSD image are sequentially written in the input buffer group 22.
[0064]
The selection units A and B in the pixel processing units 1 to 16 always select an input port in the OSD process. For example, in the pixel processing unit 1, pixel data X1 of the input port A and “0” of the input port B are held in the delay units A and B, respectively, and further added by the adder A (X1 + 0 = X1). . The addition result is multiplied by “1” input from the input port C by the multiplier A, and “0” is added by the adder B. As a result, the pixel data X1 of the input port A is output from the adder B as it is. Similarly, the pixel data X2 to X16 of the input port A from the pixel processing unit 2 to the pixel processing unit 16 are output from the adder B as they are.
[0065]
The pixel data X1 to X16 output from the adder B are stored in the output buffer group 23, and further overwritten on the display frame data in the external memory 220 via the dual port memory 100 by the POUC 209.
As shown in FIG. 16, the above processing is repeated for the entire OSD image, whereby the OSD image in the external memory 220 is overwritten and copied to the display frame data. This is the simplest of the OSD processes, and the POUA 207 or the POUB 208 simply relays the OSD image in units of 16 pixels.
[0066]
As another form of OSD processing, (1) OSD images and display frame data may be blended. When the blend rate is 0.5, the pixel data of the OSD image is input from the input buffer group 22 to each input port A of the pixel processing unit 1 to the pixel processing unit 16, and the pixel data of display frame data is input to each input port B. What is necessary is just to supply.
When the blend ratio is α: (1−α), (OSD image pixel data, 0, α) is input from the input buffer group 22 to the input ports A, B, and C of each pixel processing unit in the first clock. In the second clock, (0, pixel data of display frame data, 1-α) may be supplied.
[0067]
Further, when the OSD image is displayed in a reduced size, the above-described filtering process may be applied to the OSD image from the input buffer group 22 and overwritten from the output buffer group 23 to the position to be reduced and displayed in the display frame data.
Furthermore, the OSD image may be reduced by filtering and then blended.
<2.4 ME (Motion Prediction) Processing>
FIG. 17 is a diagram illustrating input / output pixel data when ME (motion prediction) processing is performed in the pixel arithmetic unit. In the input pixel column of the figure, X1 to X16 are 16 pixels of the macroblock in the encoding target frame, and R1 to R16 are 16 pixels in a 16 × 16 pixel rectangular area in the reference frame. FIG. 18 is an explanatory diagram showing the relationship between these pixels. The reference frame motion vector (MV) search range in FIG. 6 is a range to be searched for motion vectors around the same position as the encoding target macroblock (for example, +16 pixels to −16 pixels in the horizontal and vertical directions). It is. In this MV search range, a 16 × 16 rectangular area is present at 16 × 16 positions for pixel-by-pixel search, and 32 × 32 for half-pel (1 / 2-pixel) search. Located in the street position. FIG. 13 shows only the upper left rectangular area in the MV search range.
[0068]
In the ME processing, the sum of differences between pixels is obtained between each rectangular area in the MV search range and the macroblock to be encoded, and the rectangular area with the smallest sum (that is, the highest correlation). The displacement of the relative position between the rectangular area) and the encoding target macroblock is determined as a motion vector. The encoding target block is compared with the rectangular region having the highest correlation.
[0069]
Under the control of the POUC 209, the pixel data X1 to X16 to be encoded and the pixel data R1 to R16 of one rectangular area are transferred to the input buffer group 22. As for the pixel data R1 to R16 in the rectangular area, one line in the rectangular area is transferred for each clock. Accordingly, R1 to R16 for 16 lines are transferred for one rectangular area.
According to FIG. 17, for example, the pixel processing unit 1 shown in FIG. 4 performs subtraction and absolute value conversion between the pixel data X1 of the input port A and the pixel data R1 of the input port B in the adder A at the first clock. And passes through the multiplier A (multiplied by 1). The adder B outputs an addition value between the multiplier output and the data held in the delay unit D. In the first clock, the adder B outputs | X1-R1 | on the first line.
[0070]
In the second clock, the first line | X1-R1 | is held in the delay unit. Therefore, the adder B is held in the second line | X1-R1 | from the multiplier A and the delay unit D. Add | X1-R1 | on the first line.
In the third clock, the first and second lines | X1-R1 | are accumulated in the delay unit, so that the adder B holds the third line | X1-R1 | from the multiplier A and the delay unit D. Then, | X1-R1 | in the first line is added.
[0071]
In the 16th clock, the adder B outputs the cumulative value (Σ | X1-R1 |) of | X1-R1 |
Accumulated values (Σ | X1-R1 |) to (Σ | X16-R16 |) are also output for the pixel processing units 2 to 16, respectively.
These 16 accumulated values are held in the output buffer group 23 at the 17th clock, taken out by the POUC 209, and the sum of the 16 accumulated values is calculated and stored in the work area in the external memory 220.
[0072]
This completes the calculation of the sum of the differences between the pixel data of one rectangular area and the encoding target macroblock.
Thereafter, the sum of differences is calculated in the same manner for other rectangular areas in the MV search range. When the sum of differences is calculated for all rectangular areas (or necessary rectangular areas) within the MV search range, the rectangular area having the smallest value is determined as the rectangular area having the highest correlation, and a motion vector is generated. The
[0073]
In the ME process, the total of 16 accumulated values from the pixel processing unit is separately performed, but the total of 16 accumulated values may be calculated in the pixel processing units 1 to 16. In this case, the 16 accumulated values for one rectangular area are stored as they are from the output buffer group 23 in the work area of the external memory 220, and the accumulated value groups for 16 or more rectangular areas are stored in this work area. In this case, each of the pixel processing units 1 to 16 may share one rectangular area and sequentially accumulate the 16 accumulated values to obtain the sum of the differences.
[0074]
In the ME process, the difference is calculated in units of pixels, but may be calculated in units of half pels. In that case, of the half line and the real line, | X1-R1 | is calculated for one line as described above for the real line, and for half lines, for example, one half of the two clocks is half-pel. The pixel value ((R1 + R1 ′) / 2) may be calculated, and the difference | X1− (R1 + R1 ′) / 2 | may be calculated in the next one clock. Alternatively, half-pel pixel values ((R1 + R1 ′ + R2 + R2 ′) / 4) may be calculated in 4 clocks out of 5 clocks, and the difference may be calculated in the next 1 clock.
<3.1 Vertical Filter Processing (Part 1)>
FIG. 19 is a schematic block diagram of the media processor showing the flow of data when performing vertical filter processing in the media processor shown in FIG.
[0075]
In the figure, a decoder unit 301 corresponds to the VLD 205, TE 206, and POUA 207 (MC processing) for decoding (decompressing) the video elementary stream in FIG. 2, and decodes (decompresses) the video elementary stream.
The frame memory 302 corresponds to the external memory 220 and holds video data (frame data) as a decoding result.
The vertical filter 303 corresponds to the POUB 208 and performs vertical reduction by vertical filter processing.
The buffer memory 304 corresponds to the external memory 220 and holds reduced video data (frame data for display).
[0076]
The image output unit 305 corresponds to the video buffer memory 212 and the video unit 213, converts display frame data into a video signal, and outputs the video signal.
The POUA 207 is responsible for MC processing and the POUB 208 is responsible for vertical filter processing. Further, it is assumed that the horizontal reduction by the horizontal filter processing is performed by one of the POUA 207 and the POUB 208 with respect to the decoded frame data of the frame memory 302.
<3.1.1 1/2 reduction>
FIG. 20 is a diagram showing temporal changes in the data supply states of the frame memory 302 and the buffer memory 304 when the 1/2 reduction process is performed in FIG.
[0077]
In FIG. 20, the vertical axes of the graphs 701 to 703 indicate time in units of the period V of the vertical synchronization signal of the field. In the figure, five periods are shown, and the time axes of the graphs 70 to 703 are the same. The horizontal axis of the graph 701 indicates the amount of data in the frame memory 302. The horizontal axis of the graph 7702 indicates the amount of data in the buffer memory 304. A graph 703 shows a frame (field) being output by the image output unit 305.
[0078]
A solid line 704 in the graph 701 indicates the amount of frame data supplied from the decoder unit 301 to the frame memory 302. A broken line 705 indicates the amount of frame data supplied from the frame memory 302 to the vertical filter unit 303.
A broken line 706 in the graph 702 indicates the supply amount of the 1st field reduced image from the vertical filter unit 303 to the buffer memory 304. An alternate long and short dash line 707 indicates the supply amount of the 2nd field reduced image from the vertical filter unit 303 to the buffer memory 304.
[0079]
A solid line 708 in the graph 702 indicates the supply state of 1st field reduced image data from the buffer memory 304 to the image output unit 305. In the case of 1/2 reduction, since the display position of the reduced image can be taken from the upper half position to the lower half position of the frame, the solid line 709 in the figure differs in timing depending on the display position. Similarly, a solid line 709 indicates a supply state of 2nd field reduced image data from the buffer memory 304 to the image output unit 305.
[0080]
As shown by the graph 701, the supply of the frame data of n frames from the decoder unit 301 to the frame memory 302 starts immediately after the supply of the 2nd field of the n-1 frame from the frame memory 302 to the vertical filter unit 303 is started. Control is performed so that the supply of frame data of n frames from the memory 302 to the vertical filter unit 303 is completed immediately before the supply from the frame memory 302 of the 1st field of n frames to the vertical filter unit 303 is completed.
[0081]
As shown by the graph 702, the supply of the frame data of the 1st field of n frames from the vertical filter unit 303 to the buffer memory 304 is the display of the frame data of the 2nd field of n frames while the 2nd field of the n-1 frame is displayed. Control is performed so that each is completed during display of the 1st field of n frames.
By controlling the apparatus in this manner, it is sufficient that the decoder unit 301 and the frame memory 302 have the ability to transfer one frame of frame data in a 2V period. Between the frame memory 302 and the vertical filter unit 303, it is sufficient to have the ability to transfer 1/2 frame data in a 1V period. It is sufficient that the decoder unit 301 has an arithmetic capability for generating one frame of frame data in a 2V period, and the vertical filter unit 303 has an arithmetic capability to filter 1/2 frame data in a 1V period. Between the vertical filter unit 303 and the buffer memory 304, it is sufficient to have the ability to transfer ¼ frame data in a 1V period. Between the buffer memory 304 and the image output unit 305, it is sufficient to have the ability to transfer 1/4 frame data in a 1V period. The frame memory 302 holds one frame of frame data, and the buffer memory 304 only needs to have a capacity for holding 1/2 frame data.
[0082]
Next, for comparison with FIG. 20, FIG. 21 shows a time change in the data supply state when the buffer memory 304 is not provided.
When the reduction process is not performed, the supply of digital image data of n frames to the frame memory 302 has started to be supplied to the vertical filter unit 303 of the 2nd field of the n-1 frame indicated by the broken line 507, as indicated by the solid line 506. It starts from the time and ends before the supply to the vertical filter unit 303 of the 1st field of the n frame indicated by the broken line 508 is completed. Therefore, one frame of digital image data is supplied at a constant rate during the 2V period shown on the graph of FIG.
[0083]
Also, the supply of digital image data from the frame memory 302 of the 1st field of n frames to the vertical filter unit 303 ends the supply of the digital image data of n frames to the frame memory 302 indicated by the solid line 511 as indicated by a broken line 508. The process is completed immediately after, and then the processing of the 2nd field is started. For this reason, digital image data is supplied from the frame memory 302 to the vertical filter unit 303 by supplying one field of digital image data at a constant rate during a period of 1V shown in the graph of FIG.
[0084]
However, when the 1/2 reduction process is performed, the timing at which the supply of n frames of digital image data to the frame memory 302 can be started differs depending on the display position of the 2nd field of the n-1 frame. Depending on the display position of the 2nd field of n−1 frame, the digital image data is supplied from the frame memory 302 to the vertical filter unit 303 somewhere between the broken lines 509 to 510, and n frames of digital data are supplied to the frame memory 302. The timing at which the supply of image data can be started is most delayed in the case of the display position indicated by the broken line 510. In this case, the 1/2 reduced image is output to the lower half of the image output unit 501. Further, the supply of n frames of digital image data to the frame memory 302 must be completed before the supply to the vertical filter unit 303 of the 1st field of the n frames indicated by the broken line 511 is completed. For this reason, it is necessary to supply one frame of digital image data at a constant speed during the 1 V period shown in the graph of FIG. 21, and twice the supply capability is required as compared with the case where no reduction is performed.
[0085]
In addition, the supply of digital image data from the frame memory 302 of the 1st field of n frames to the vertical filter unit 303 ends the supply of the digital image data of n frames to the frame memory 302 indicated by the solid line 512, as indicated by a broken line 511. The process is completed immediately after, and then the processing of the 2nd field is started. Therefore, it is necessary to supply one field of digital image data at a constant speed during the period of 1 / 2V shown in the graph of FIG. 5, and twice the supply capability is required as compared with the case where no reduction is performed. . Since the vertical filter unit 303 is also required to have performance corresponding to the supplied digital image data, it requires twice the computing power as compared with the case where no reduction is performed.
[0086]
For comparison with FIG. 20, FIG. 23 shows a time change in the data supply state when the 1/4 reduction process is performed when the buffer memory 304 is not provided.
FIG. 23 shows a graph when the 1/4 reduction processing is performed. For the same reason as described above, the supply capability of digital image data to the frame memory 302, the supply capability from the frame memory 302 to the vertical filter unit 303, and the calculation capability of the vertical filter unit are four times that when the reduction process is not performed. Necessary. Thus, when the buffer memory 304 is not provided, the required peak performance increases as the reduction ratio increases.
<3.1.2 1/4 reduction>
FIG. 22 is a diagram showing a data supply state of each unit and its change over time when 1/4 reduction is performed by the media processor shown in FIG.
[0087]
In FIG. 22, the horizontal and vertical axes of the graph are the same as those in FIG.
A solid line 804 on the graph indicates a supply state of frame data from the decoder unit 301 to the frame memory 302. A broken line 805 on the graph indicates a supply state of frame data from the frame memory 302 to the vertical filter unit 303. A broken line 806 on the graph indicates a supply state of 1st field reduced image data from the vertical filter unit 303 to the buffer memory 304. A broken line 807 on the graph indicates a supply state of 2nd field reduced image data from the vertical filter unit 303 to the buffer memory 304. A solid line 808 on the graph indicates a supply state of 1st field reduced image data from the buffer memory 304 to the image output unit 305. A solid line 809 on the graph indicates a supply state of 2nd field reduced image data from the buffer memory 304 to the image output unit 305.
[0088]
As shown in the figure, between the decoder unit 301 and the frame memory 302, it is sufficient if there is an ability to transfer one frame of frame data in a 2V period. Between the frame memory 302 and the vertical filter unit 303, it is sufficient to have the ability to transfer frame data of 1/2 frame in a 1V period. It suffices that the decoder unit 301 has a calculation capability for generating one frame of frame data in a 2V period. The vertical filter unit 303 is capable of filtering 1/2 frame data in a 1V period. Between the vertical filter unit 303 and the buffer memory 304 is 1/8 frame data transfer capability in a 1V period. Between 304 and the image output unit 305, a frame data transfer capability of 1/8 frame is sufficient for a period of 1V. A frame memory 302 that can hold one frame of frame data and a buffer memory 304 that can hold ¼ frame data are required.
[0089]
Each of these required performances is an average ability in a period of 1 V at the shortest, and even if the reduction ratio increases, a large peak performance is not required in a short period. The processing performance is most required when there is no reduction. In this case, the frame data transfer capability of 1 frame is sufficient between the decoder unit 301 and the frame memory 302 for a period of 2V. Between the frame memory 302 and the vertical filter unit 303, a frame data transfer capability of 1/2 frame is sufficient for a period of 1V. The decoder unit 301 only needs a calculation capability to generate frame data of one frame in a 2V period. The vertical filter unit 303 only needs to have a computing capability to filter 1/2 frame data in a 1V period. Between the vertical filter unit 303 and the buffer memory 304, a frame data transfer capability of 1/2 frame is sufficient for a period of 1V. Between the buffer memory 304 and the image output unit 305, a frame data transfer capacity of 1/2 frame is sufficient for a period of 1V. The frame memory 302 can hold one frame of frame data, and the buffer memory 304 only needs to hold one frame of frame data. With this ability, you can perform any vertical reduction process. As a result, the circuit scale can be reduced and the operation clock can be lowered.
<3.2 Vertical Filter Processing (Part 2)>
FIG. 24 is a schematic block diagram showing the flow of data when vertical filtering is performed in the media processor.
[0090]
The figure includes a decoding unit 401, a buffer memory 402, a vertical filter unit 403, a buffer memory 404, a video output unit 405, and a control unit 406. Compared with FIG. 19, the decoding unit 401, the vertical filter unit 403, the buffer memory 404, and the video output unit 405 are the same as those in FIG. Therefore, the description of the same points will be omitted, and different points will be mainly described.
[0091]
The buffer memory 402 is different from the frame memory 302 in that the capacity may be smaller than the storage capacity for one frame.
The vertical filter unit 403 is different from the vertical filter unit 303 in that the vertical filter unit 403 notifies the control unit 406 of the fact (filter state) every time filtering processing of 64 lines in the vertical direction (4 macroblock lines in the frame before processing) is completed. Different. The unit of notification may be a unit of 2-3 macroblock lines.
[0092]
The decoding unit 401 is different from the decoding unit 301 in that the decoding unit 401 notifies the control unit 406 to that effect (decoding state) every time decoding of 64 lines is completed. The unit of notification may be a unit of 16 lines.
The control unit 406 corresponds to the IOP 211 in FIG. 2 and monitors the operation states of the decoding unit 401 and the vertical filter unit 403 based on notifications from each, so that the vertical filter processing does not exceed the decoding processing. In addition, the decoding unit 401 and the vertical filter unit 403 are controlled so that the decoding process does not overtake the vertical filter process. That is, the following two of the control unit 406 are controlled. One is that the pixel data of the macroblock line to be filtered is not written in the buffer memory 402 by the decoding unit 401, but the pixel of the macroblock line of the previous frame (or field) is written by the vertical filter unit 403. This is to prevent the filtering process from being performed on the data group. The other is to prevent the decoding unit 401 from overwriting the pixel data group of the next frame with respect to a macroblock line which is subject to vertical filtering by the vertical filter unit 403 but has not been processed.
[0093]
FIG. 25 is an explanatory diagram showing the control contents in the control unit 406.
The horizontal axis in the figure is time, and shows the operations of the control unit 406, VSYNC (vertical synchronization signal), decoding unit 401, vertical filter unit 403, and video output unit 405.
As shown in the figure, the decoding unit 401 notifies the control unit 406 every time 64 lines have been decoded, and the vertical filter unit 403 notifies the control unit 406 every time 64 lines have been filtered. To do. Under these notifications, the control unit 406 holds and updates the line number Nd for which decoding has been completed and the line number Nf for which filtering has been completed, and Nd (current frame)> Nf (current frame), Nd ( The decoding unit 401 and the vertical filter unit 403 are controlled so as to satisfy (next frame) <Nf (current frame). Specifically, the control unit 406 temporarily stops one of the decoding unit 401 and the vertical filter unit 403 when Nd and Nf approach each other (when the difference becomes equal to or less than a threshold value). Nd and Nf may be macroblock line numbers.
[0094]
Also, when Nd and Nf approach, one of the decoding unit 401 and the vertical filter unit 403 is temporarily stopped by the control unit 406 under the control of the control unit 406, but whether or not Nd and Nf approach each other The determination and the control for temporarily stopping the decoding unit 401 or the vertical filter unit 403 may be configured so as to be in charge of other than the control unit 406.
[0095]
For example, the vertical filter unit 403 notifies the decoding unit 401 of the filter state, and the decoding unit 401 determines whether Nd and Nf have approached according to the notification of the filter state and the internal decoding state. The decoding operation may be temporarily stopped or the vertical filter unit 403 may be temporarily stopped according to the determination result.
[0096]
Or, conversely, the decoding unit 401 notifies the vertical filter unit 403 of the decoding state, and the vertical filtering unit 403 determines whether Nd and Nf approach each other according to the notification of the decoding state and the internal filter state. It is good also as a structure which determines whether or not and stops the filter process temporarily according to the determination result, or stops the decoding part 401 temporarily.
<3.2.1 1/2 reduction>
FIG. 26 is a diagram showing the supply data amount of each part when the 1/2 reduction process is performed in FIG.
[0097]
The horizontal axis of the graph 901 indicates the amount of frame data on the buffer memory 402, and the vertical axis indicates time. The horizontal axis of the graph 902 indicates the amount of frame data on the buffer memory 404, and the vertical axis indicates time. A graph 903 shows the state of the image output unit 405 arranged in time series, and the time axis matches the vertical axes of the graphs 901 and 902.
[0098]
A solid line 904 on the graph indicates a supply state of frame data from the decoder unit 401 to the buffer memory 402. A broken line 905 on the graph indicates a frame data supply state from the buffer memory 402 to the vertical filter unit 403. A broken line 906 on the graph indicates a supply state of 1st field reduced image data from the vertical filter unit 403 to the buffer memory 404. A broken line 907 on the graph indicates a supply state of the 2nd field reduced image data from the vertical filter unit 403 to the buffer memory 404. A solid line 908 on the graph indicates a supply state of 1st field reduced image data from the buffer memory 404 to the image output unit 405. A solid line 909 on the graph indicates a supply state of the 2nd field reduced image data from the buffer memory 404 to the image output unit 405.
[0099]
As shown in the graph 901, immediately after the supply of n frame data from the decoder unit 401 to the buffer memory 402 is started, the supply of n frame data from the buffer memory 402 to the vertical filter unit 403 is started. Control is performed so that the supply of n frame data from the buffer memory 402 to the vertical filter unit 403 is completed immediately after the supply of n frame data from the decoder unit 401 to the buffer memory 402 is completed. As shown by a graph 902, control is performed so that the supply of frame data of n frames from the vertical filter unit 403 to the buffer memory 404 is completed during n−1 frame display.
[0100]
By controlling the apparatus in this manner, the frame data transfer capability between the decoder unit 401 and the buffer memory 402 is 1 frame in a period of 2V, and the frame data transfer capability between the buffer memory 402 and the vertical filter unit 403 is 1 frame in a period of 2V. From the vertical filter unit 403, the frame data transfer capability, the decoder unit 401 is an arithmetic capability to generate frame data of one frame in a 2V period, and the vertical filter unit 403 is an arithmetic capability to filter frame data of one frame in a 2V period. Between the buffer memories 404, a frame data transfer capability of 1/2 frame in a 2V period, and between the buffer memory 404 and the image output unit 405, a frame data transfer capability of 1/4 frame in a 1V period, a frame for several lines Buffer memory 402 that can hold data, frame data 1 frame A buffer memory 404 that can be lifting is required, respectively.
<3.2.2 1/4 reduction>
FIG. 27 is a diagram showing the data supply amount of each part when 1/4 reduction is performed in FIG.
[0101]
The horizontal axis of the graph 1001 indicates the amount of frame data on the buffer memory 402, and the vertical axis indicates time. The horizontal axis of the graph 1002 indicates the amount of frame data on the buffer memory 404, and the vertical axis indicates time. A graph 1003 is a graph in which the states of the image output unit 405 are arranged in time series, and the time axis matches the vertical axes of the graphs 1001 and 1002.
[0102]
A solid line 1004 on the graph indicates a frame data supply state from the decoder unit 401 to the buffer memory 402. A broken line 1005 on the graph indicates a supply state of frame data from the buffer memory 402 to the vertical filter unit 403. A broken line 1006 on the graph indicates a supply state of 1st field reduced image data from the vertical filter unit 403 to the buffer memory 404. A broken line 1007 on the graph indicates a supply state of the 2nd field reduced image data from the vertical filter unit 403 to the buffer memory 404. A solid line 1008 on the graph indicates a supply state of 1st field reduced image data from the buffer memory 404 to the image output unit 405. A solid line 1009 on the graph indicates a supply state of the 2nd field reduced image data from the buffer memory 404 to the image output unit 405.
[0103]
By controlling the apparatus in this manner, a frame data transfer capability of 1 frame is sufficient between the decoder unit 401 and the buffer memory 402 in a 2V period, and between the buffer memory 402 and the vertical filter unit 403 is 1 in a 2V period. The frame frame data transfer capability is sufficient, the decoder unit 401 is sufficient to generate frame data of one frame in a 2V period, and the vertical filter unit 403 is calculation capability of filtering one frame of frame data in a 2V period. Therefore, a frame data transfer capacity of 1/4 frame is sufficient between the vertical filter unit 403 and the buffer memory 404 in a 2V period, and 1/8 frame is sufficient between the buffer memory 404 and the image output unit 405 in a 1V period. Frame data transfer capability is sufficient. The buffer memory 402 can hold frame data for several lines, and the buffer memory 404 only needs to hold 1/2 frame data.
[0104]
Each of these required performances is an average ability in a period of 1V at the shortest, and a large peak performance is not required in a period where the reduction rate is short.
In the case where the processing performance is required most but no reduction is required, the frame data transfer capability of one frame and the buffer memory are required between the decoder unit 401 and the buffer memory 402 in the period of 2V. Between 402 and the vertical filter unit 403, a frame data transfer capability of 1 frame in a 2V period, a decoder unit 401 has an arithmetic capability to generate frame data of 1 frame in a 2V period, and the vertical filter unit 403 has a 1 in 2V period. Computational capability for filtering frame data of a frame, between the vertical filter unit 403 and the buffer memory 404, a frame data transfer capability of 1 frame in a 2V period, and between the buffer memory 404 and the image output unit 405 1 in a 1V period / 2 frame data transfer capability, buffer that can hold frame data for several lines The memory 402 is a buffer memory 404 that can hold two frames of frame data. With this capability, any vertical reduction processing can be performed. As a result, the circuit scale can be reduced and the operation clock can be lowered.
<4. Modification>
28 and 29 are diagrams illustrating a first modification of the left half and the right half of the pixel parallel processing unit. In these drawings, the same constituent elements as those in FIGS. 3 and 4 are denoted by the same reference numerals, and the description thereof will be omitted.
[0105]
28 and 29 include pixel processing units 1 a to 16 a instead of the pixel processing units 1 to 16 of FIGS. 3 and 4, and pixel transfer units 17 a and 18 b instead of the pixel transfer units 17 and 18. Since the pixel processing units 1a to 16a have the same configuration, the pixel processing unit 1a will be described as a representative.
The pixel processing unit 1a includes a selection unit A104a and a selection unit B105a instead of the selection unit A104 and the selection unit B105 in the pixel processing unit 1.
[0106]
The selection unit A104a is different from the selection unit A104 in that the number of input is changed from two to three. That is, the selection unit A 104a has more pixel data inputs than the delay unit (delay unit B) of the two adjacent pixel transfer units (or pixel processing units).
Similarly, in the selection unit B105a, the pixel data input of the delay unit (delay unit B) of the two adjacent pixel transfer units (or pixel processing units) is increased.
[0107]
The pixel transfer unit 17a includes a selection unit B1703a to a selection unit G1708a instead of the selection unit B1703 to the selection unit G1708. Selection unit B1703a to selection unit G1708a each have three inputs instead of two inputs. The increasing input is the pixel data input from the two left delay units.
The pixel transfer unit 18a includes a selection unit B1803a to a selection unit G1808a instead of the selection unit B1803 to the selection unit G1808. Selection unit B1803a to selection unit G1808a each have three inputs instead of two inputs. The increasing input is the pixel data input from the two right delay devices.
[0108]
According to this configuration, it is possible to perform a filter process that uses a pixel to be processed and two pixels adjacent to the left and right of the pixel in order.
For example, the pixel processing unit 1a can calculate the following equation.
a0 ・ X9 + a1 (X11 + X7) + a2 (X13 + X5) + a3 (X15 + X3)
30 and 31 are diagrams illustrating a second modification of the left half and the right half of the pixel parallel processing unit.
[0109]
30 and 31 include a pixel processing unit 1b and a pixel processing unit 16b instead of the pixel processing unit 1 and the pixel processing unit 16 of FIGS.
The pixel processing unit 1b includes a selection unit b105b instead of the selection unit B105 in the pixel processing unit 1. Selection unit B105b differs from selection unit B105 in that it has a feedback input from delay unit B107.
[0110]
The pixel processing unit 16b includes a selection unit A1604b instead of the selection unit A1604 in the pixel processing unit 16. The selection unit A1604b differs from the selection unit A1605 in that it has a feedback input from the delay unit A1606.
According to this configuration, the pixel processing unit 1b performs, for example, the following calculation.
a3 * X6 + a2 * X7 + a1 * X8 + a0 * X9 + a1 * X10 + a2 * X11 + a3 * X12
At this time, the output of the pixel processing unit 2 is as follows.
[0111]
a3 * X20 + a2 * X21 + a1 * X22 + a0 * X23 + a1 * X24 + a2 * X24 + a3 * X24
At this time, the output of the pixel processing unit 16b is as follows.
a3 * X21 + a2 * X22 + a1 * X23 + a0 * X24 + a1 * X24 + a2 * X24 + a3 * X24
As described above, in FIGS. 30 and 31, when the leftmost pixel data of the data string is transferred to the leftmost pixel processing unit 1b, the selection unit B105b receives the feedback input from the delay unit B in the pixel processing unit 1b. select. When the rightmost pixel data of the data string is transferred to the rightmost pixel processing unit 16b, the selection unit A1604b selects the feedback input from the delay device A1606.
[0112]
32 and 33 are diagrams illustrating a second modification of the left half and the right half of the pixel parallel processing unit.
32 and 33 include pixel processing units 1c to 16c instead of the pixel processing units 1 to 16 of FIGS. 3 and 4, and pixel transfer units 17c and 18c instead of the pixel transfer units 17 and 18. Since the pixel processing units 1c to 16c have the same configuration, the pixel processing unit 1a will be described as a representative.
[0113]
The pixel processing unit 1c includes a selection unit A104c and a selection unit B105c instead of the selection unit A104 and the selection unit B105 in the pixel processing unit 1.
The selection unit A104c is different from the selection unit A104 in that the number of input is changed from two to three. That is, in the selection unit A104c, the pixel data input of the delay unit (delay unit B) of the two adjacent pixel transfer units (or pixel processing units) is increased.
[0114]
In the selection unit B105c, the pixel data input of the delay unit (delay unit B) of the two adjacent pixel transfer units (or pixel processing units) and the feedback input from the delay unit B107 are increased.
Similarly to the pixel transfer units 17a and 18a shown in FIGS. 28 and 29, the pixel transfer units 17c and 18c have three inputs instead of two inputs.
[0115]
According to this configuration, the pixel processing unit 1c performs, for example, the following calculation.
a3 * X9 + a2 * X9 + a1 * X9 + a0 * X9 + a1 * X11 + a2 * X13 + a3 * X15
At this time, the output of the pixel processing unit 2c is as follows.
a3 * X10 + a2 * X10 + a1 * X10 + a0 * X10 + a1 * X12 + a2 * X14 + a3 * X16
At this time, the output of the pixel processing unit 15c is as follows.
a3 * X17 + a2 * X19 + a1 * X21 + a0 * X23 + a1 * X23 + a2 * X23 + a3 * X23
At this time, the output of the pixel processing unit 16c is as follows.
a3 * X18 + a2 * X20 + a1 * X22 + a0 * X24 + a1 * X24 + a2 * X24 + a3 * X24
FIG. 34 is a diagram showing a modification of the POUA 207.
[0116]
The POUA 207 shown in the figure has an upsampling circuit 22a and a downsampling circuit 23a added to FIG. The description of the same points as in FIG. 2 will be omitted, and different points will be mainly described.
The upsampling circuit 22a expands the pixel data group input from the input buffer group 22 in the vertical direction. For example, in order to interpolate the pixel data so that the pixel data group input from the input buffer group 22 is doubled in the vertical direction, the same pixel data group for one input of the pixel data group from the input buffer group 22 Is output to the pixel parallel processing unit 21 twice.
[0117]
The downsampling circuit 23a reduces the pixel data group input from the pixel parallel processing unit 21 in the vertical direction. For example, the pixel data is thinned out so that the pixel data group input from the pixel parallel processing unit 21 is halved in the vertical direction. That is, with respect to two input of the pixel data group from the pixel parallel processing unit 21, one time is discarded and one time is output.
[0118]
According to this configuration, the pixel parallel processing unit 21 doubles in the vertical direction on the input side and halves in the vertical direction on the output side, so the data amount per frame in the external memory 220 is 1 / vertical in the vertical direction. As a result, the amount of data transferred from the POUC 209 to the POUA 207 can be halved. As a result, the bus neck can be eliminated when accesses to the internal ports of the dual port memory 100 are concentrated.
[0119]
In addition, since the pixel arithmetic device of the present invention performs the filtering process for resizing an image on a plurality of pixels in parallel, it can be applied to digital video equipment such as a media processor for handling moving picture compression / expansion processing, resizing, etc. Used.
[0120]
【The invention's effect】
A pixel arithmetic device according to the present invention is a pixel arithmetic device that performs filter processing, and includes N pixel processing means, supply means for supplying N pixel data and filter coefficients, and N pixel processing means in parallel. And control means for operating the device.
Each pixel processing means calculates using the pixel data supplied to the supplying means and the filter coefficient, then acquires pixel data from the pixel processing means adjacent to each pixel processing means, and uses the acquired pixel data And accumulate the calculation results. The control means controls the N pixel processing means so as to repeat the acquisition of the pixel data from the adjacent pixel processing means and the calculation and accumulation using the acquired pixel data for the number of taps.
[0121]
Here, the N pixel processing units form a first shifter that shifts N pixel data to the right and a second shifter that shifts N pixel data to the left. Each pixel processing means performs an operation using two pixel data shifted out from two adjacent pixel processing means.
According to this configuration, it is possible to make the number of taps variable, and it is possible to perform a filtering process that speeds up the process without increasing the frequency.
[0122]
In addition, the pixel arithmetic device of the present invention supplies pixel data of a difference image and pixel data of a reference frame as pixel data from a supply unit.
According to this configuration, it can be used not only for the filtering process but also for the MC (motion compensation) process, and it is not necessary to provide the filter device and the MC circuit independently, so that the circuit scale can be reduced. There is.
[Brief description of the drawings]
FIG. 1 is a block diagram showing an example of a circuit that performs FIR filter processing in the prior art.
FIG. 2 is a block diagram illustrating a configuration of a media processor including a pixel operation unit.
FIG. 3 is a block diagram showing a configuration of a pixel operation unit (POUA, POUB).
FIG. 4 is a block diagram illustrating a configuration of a left half of a pixel parallel processing unit.
FIG. 5 is a block diagram showing a configuration of a right half of a pixel parallel processing unit.
6A is a block diagram illustrating a detailed configuration of an input buffer group 22. FIG.
FIG. 2B is a block diagram illustrating a detailed configuration of the selection unit in the input buffer group 22.
7 is a block diagram showing a configuration of an output buffer group 23. FIG.
FIG. 8 is a diagram illustrating initial input values of pixel data when filter processing is performed in the pixel arithmetic unit.
9 is an explanatory diagram showing initial input values of pixel data to the pixel processing unit 1. FIG.
10 is a diagram illustrating a calculation process in filter processing in the pixel processing unit 1. FIG.
FIG. 11 is an explanatory diagram showing calculation contents of filter processing in the pixel processing unit 1;
FIG. 12 is a diagram showing input / output pixel data when MC (motion compensation) processing (P picture) is performed in the pixel operation unit.
FIG. 13 is an explanatory diagram showing a decoding target frame and a reference frame in MC processing.
FIG. 14 is a diagram showing input / output pixel data when MC processing (B picture) is performed in the pixel operation unit.
FIG. 15 is a diagram showing input / output pixel data when OSD (on-screen display) processing is performed in the pixel calculation unit.
FIG. 16 is an explanatory diagram of OSD (on-screen display) processing in the pixel calculation unit.
FIG. 17 is a diagram illustrating input / output pixel data when ME (motion prediction) processing is performed in the pixel calculation unit.
FIG. 18 is an explanatory diagram of ME (motion prediction) in the pixel calculation unit.
FIG. 19 is a schematic block diagram illustrating a data flow when performing vertical filter processing in a media processor.
FIG. 20 is an explanatory diagram when performing vertical ½ reduction.
FIG. 21 is an explanatory diagram when performing vertical ½ reduction in the prior art.
FIG. 22 is an explanatory diagram when vertical ¼ reduction is performed.
FIG. 23 is an explanatory diagram when vertical ¼ reduction is performed in the prior art.
FIG. 24 is another schematic block diagram showing the flow of data when vertical filtering is performed in the media processor.
FIG. 25 is an explanatory diagram showing timings of decoding processing and vertical filter processing;
FIG. 26 is an explanatory diagram when vertical ½ reduction is performed.
FIG. 27 is an explanatory diagram when vertical ¼ reduction is performed.
FIG. 28 is a diagram illustrating a first modification of the left half of the pixel parallel processing unit.
FIG. 29 is a diagram illustrating a first modification of the right half of the pixel parallel processing unit.
FIG. 30 is a diagram illustrating a second modification of the left half of the pixel parallel processing unit.
FIG. 31 is a diagram illustrating a second modification of the right half of the pixel parallel processing unit.
FIG. 32 is a diagram illustrating a third modification of the left half of the pixel parallel processing unit.
FIG. 33 is a diagram illustrating a third modification of the right half of the pixel parallel processing unit.
FIG. 34 is a diagram showing a modification of the pixel processing unit.
[Explanation of symbols]
1 Pixel processing unit
1 to 16 pixel processing unit
17 Pixel transfer unit
18 pixel transfer section
21 Pixel parallel processing unit
22 Input buffers
22a Upsampling circuit
23 Output buffer group
23a Downsampling circuit
23a-23h Latch
23a-23p latch
23i-23p latch
24 Instruction memory
24a-24h selector
24a-24p selector
24i-24p selector
25 Instruction decoder
26 Indicator circuit
27 DDA circuit
100 dual port memory
104 Selection part A
104a Selection part A
104c Selection part A
105 Selector B
105a Selection part B
105b Selection part B
105c Selection part B
107 Delayer B
109 Delay D
120 Adder A
200 Media processor
201 Stream unit
201 Input port A
202 I / O buffer
202 Input port B
203 Setup processor
203 Input port C
204 bitstream FIFO
205 Variable length code decoding unit
206 TE
207 POUA
208 POUB
209 POUC
210 Audio unit
211 IOP
212 Video buffer memory
213 video unit
214 Host unit
215 RE
216 Filter section
217 Setup memory
218 Dedicated LSI
220 External memory
401 Decoder unit
402 Buffer memory
403 Vertical filter section
404 Buffer memory
405 Video output unit
406 control unit

Claims

A pixel arithmetic device that performs filter processing,
N pixel processing means;
Supply means for supplying N pixel data and filter coefficients;
Control means for operating N pixel processing means in parallel,
The N pixel processing means form a first shifter that shifts N pixel data to the right for each clock input, and a second shifter that shifts N pixel data to the left for each clock input,
The first shifter shifts the pixel data supplied to the supply unit to the right, and then right-shifts the pixel data shifted right from the pixel processing unit adjacent to the left in the previous clock input,
The second shifter shifts the pixel data supplied to the supply means to the left, and then shifts the pixel data left-shifted from the right pixel processing means at the previous clock input to the left,
Each pixel processing means calculates using the pixel data supplied to the supply means and the filter coefficient, and then shifts two pixels that are shifted for each clock input from two pixel processing means adjacent to each pixel processing means. acquires data for each clock input, calculated for each clock input using the acquired pixel data, the operation result pixel calculation apparatus characterized by accumulating every clock input.

It further comprises means for specifying the number of taps for filtering,
The pixel processing means repeats acquisition of pixel data from adjacent pixel processing means, and calculation and accumulation using the acquired pixel data, the number of clock inputs corresponding to a specified number of taps. Item 2. A pixel processing apparatus according to Item 1.