JP2009123074A

JP2009123074A - Image processing apparatus

Info

Publication number: JP2009123074A
Application number: JP2007298142A
Authority: JP
Inventors: Kazuhiro Saito; 和宏齋藤; Masahiko Yoshimoto; 雅彦吉本; Hiroshi Kawaguchi; 博川口; Junichi Miyakoshi; 純一宮越; Yuichiro Murachi; 勇一郎村地; Masanari Hamamoto; 真生濱本; Takahiro Iinuma; 隆弘飯沼; Tomokazu Ishihara; 朋和石原
Original assignee: MegaChips Corp; Kobe University NUC
Current assignee: MegaChips Corp; Kobe University NUC
Priority date: 2007-11-16
Filing date: 2007-11-16
Publication date: 2009-06-04
Anticipated expiration: 2027-11-16
Also published as: JP5020029B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide an image processing apparatus capable of reducing an amount of pixel value transfer in comparison with SIMD construction and avoiding a problem of an RCSA construction (an increase in the number of cycles attributed to H. 264 block division). <P>SOLUTION: The image processing apparatus includes an array in which a plurality of arithmetic elements PE are disposed in a matrix shape. The array is divided into a plurality of sub-blocks SBSA each of which includes the prescribed number of arithmetic elements PE. Each of the plurality of sub-blocks SBSA has multiplexers 10A and 11A capable of selecting whether a self sub-block and an adjacent sub-block adjacent to the self sub-block are connected. By switching setting of the multiplexers 10A and 11A in accordance with size of an image to be processed, one or more blocks including one or more sub-blocks SBSA can be set in the array. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、画像の動き探索を実行するための画像処理装置に関する。 The present invention relates to an image processing apparatus for executing an image motion search.

背景技術として、画素間の評価値演算器のみを持つＳＩＭＤ（Single Instruction Multiple Data）構成と、画素バッファを持ち画素再利用性を高めたシストリックアレイ構成であるＲＣＳＡ（Ring Connected Systolic Array）構成とについて説明する。評価値にはＳＡＤ（Sum of Absolute Difference）を使用するものと想定して評価値演算部を構成しているが、その他の評価値を用いることも可能である。また、参照用画像データ（ＳＷ：Search Window）を格納したバッファＳＲＡＭ（以下「ＳＷＲＡＭ」と称す）と、符号化対象の画像データ（ＴＢ：Template Block）を格納したレジスタファイル（以下「ＴＢバッファ」と称す）とを外部に有するものと想定する。これらのＳＷＲＡＭ及びＴＢバッファは、１サイクルに最大６４（＝８×８）個の画素値の同時出力が可能であると想定する。 As a background art, a SIMD (Single Instruction Multiple Data) configuration having only an evaluation value arithmetic unit between pixels, and a RCSA (Ring Connected Systolic Array) configuration that has a pixel buffer and a high pixel reusability, Will be described. The evaluation value calculation unit is configured on the assumption that SAD (Sum of Absolute Difference) is used as the evaluation value, but other evaluation values can also be used. Also, a buffer SRAM (hereinafter referred to as “SWRAM”) storing reference image data (SW: Search Window) and a register file (hereinafter referred to as “TB buffer”) storing image data to be encoded (TB: Template Block). It is assumed that it has outside. These SWRAM and TB buffer are assumed to be capable of simultaneously outputting a maximum of 64 (= 8 × 8) pixel values in one cycle.

ＳＩＭＤ構成の例を図３２に示す。ＳＩＭＤ構成は、ＳＷＲＡＭ及びＴＢバッファから受け取った画素値から評価値を求めるユニットを複数持つのみの構成である。ＳＩＭＤ構成には画素値を保持しておくためのバッファが存在しないため、画素値の再利用が不可能である。後述の本発明に係るＲＲＳＡ（Reconfigurable Ring Connected Systlic Array）構成の例では、１画素の評価値の演算モジュールが５１２並列であるため、ＳＩＭＤ構成も５１２並列と想定して見積もりを行う。即ち、図３２中のＸ，Ｙはそれぞれ１６，３２となる。ＳＷＲＡＭ及びＴＢバッファからの画素出力は最大８×８画素であるため、例えば１６×１６の１点の探索を行うためには、ＲＡＭ読み出し待ちで４サイクルが必要となる。 An example of the SIMD configuration is shown in FIG. The SIMD configuration is a configuration having only a plurality of units for obtaining evaluation values from pixel values received from the SWRAM and the TB buffer. Since there is no buffer for holding pixel values in the SIMD configuration, pixel values cannot be reused. In an example of an RRSA (Reconfigurable Ring Connected Systlic Array) configuration according to the present invention, which will be described later, since the calculation module for the evaluation value of one pixel is 512 parallel, estimation is performed assuming that the SIMD configuration is also 512 parallel. That is, X and Y in FIG. 32 are 16 and 32, respectively. Since the maximum pixel output from the SWRAM and TB buffer is 8 × 8 pixels, for example, in order to search for one 16 × 16 point, four cycles are required while waiting for RAM reading.

ＲＣＳＡ構成の例を図３３に示す。ＲＣＳＡ構成は、下記非特許文献１に開示されている。ＲＣＳＡ構成は、画素間の評価値演算ユニットに加え、画素値のバッファを持ち、それらをリング状に接続した構成である。演算素子ＰＥ（Processor Element）及びシフトレジスタＳＲ（Shift Register）の内部構成をそれぞれ図３４及び図３５に示す。ＳＩＭＤ構成と同様に、ＲＣＳＡ構成も５１２並列（ＰＥ５１２並列、ＳＲ５１２並列）として見積もりを行う。即ち、図３３中のＸ，Ｙはそれぞれ１６，３２となる。例えば、１６×１６の８点の連続点探索を行う場合、ＰＥ−ａｒｒａｙ側にＲＡＭ読み出し４サイクル、ＳＲ−ａｒｒａｙ側にＲＡＭ読み出し４サイクル、評価値演算に８サイクルとなり、その結果、４＋４＋８−１＝１５サイクルが必要となる。なお、ここで最後に１サイクルを減じているのは、評価値演算は実際には「初期ロード＋シフト回数」サイクルで行われるため、ＲＡＭの読み出しサイクルと評価値演算１点分とが重複するためである。 An example of the RCSA configuration is shown in FIG. The RCSA configuration is disclosed in Non-Patent Document 1 below. The RCSA configuration has a pixel value buffer in addition to an evaluation value calculation unit between pixels and is connected in a ring shape. The internal configurations of the arithmetic element PE (Processor Element) and the shift register SR (Shift Register) are shown in FIGS. 34 and 35, respectively. Similar to the SIMD configuration, the RCSA configuration is estimated as 512 parallel (PE512 parallel, SR512 parallel). That is, X and Y in FIG. 33 are 16 and 32, respectively. For example, when a 16 × 16 8-point continuous point search is performed, 4 cycles of RAM reading are performed on the PE-array side, 4 cycles of RAM reading are performed on the SR-array side, and 8 cycles are performed for evaluation value calculation. As a result, 4 + 4 + 8-1 = 15 cycles are required. Note that the reason why one cycle is subtracted lastly is that the evaluation value calculation is actually performed in the “initial load + number of shifts” cycle, so that the RAM read cycle overlaps with one evaluation value calculation. Because.

J.Miyakoshi, Y.Murachi, K.Hamano, T.Matsuno, M.Miyama and MYoshimoto,"A Low-Power Systolic Array Architecture for Block-Matching Motion Estimation," IEICE Trans. Electoronics, Vol.E88-C, No.4, pp.559-569,April 2005.J.Miyakoshi, Y.Murachi, K.Hamano, T.Matsuno, M.Miyama and MYoshimoto, "A Low-Power Systolic Array Architecture for Block-Matching Motion Estimation," IEICE Trans. Electoronics, Vol.E88-C, No .4, pp.559-569, April 2005.

ＳＩＭＤ構成においては、膨大な画素値転送量が最大の問題となる。ＳＩＭＤ構成では画素値の再利用が不可能であるため、連続点探索を行う際には、演算に必要な全ての画素値をＳＷＲＡＭ及びＴＢバッファからその都度読み出す必要がある。その結果、膨大な画素値転送帯域が必要となる。また、ＲＡＭの読み出し画素数に制限がある場合、実際に演算を行うサイクルに加え、画素値の読み出しにかかるサイクルが膨大なものとなる。この画素値転送に関する問題が、ＳＩＭＤ構成の問題点となる。 In the SIMD configuration, the huge amount of transfer of pixel values becomes the biggest problem. Since the pixel values cannot be reused in the SIMD configuration, when performing a continuous point search, it is necessary to read out all the pixel values necessary for the calculation from the SWRAM and the TB buffer each time. As a result, a huge pixel value transfer band is required. In addition, when the number of read pixels in the RAM is limited, the cycle for reading pixel values becomes enormous in addition to the cycle for actually performing calculations. This problem relating to pixel value transfer is a problem of the SIMD configuration.

ＲＣＳＡ構成は、連続点探索を行う際に再利用可能な画素を保持しておくことで、ＳＩＭＤ構成に比べて、ＳＷＲＡＭ及びＴＢバッファからの画素値転送量を大幅に削減している。但し、Ｈ．２６４特有のブロック分割に対応していないことが、ＲＣＳＡ構成の問題点として挙げられる。即ち、並列度が５１２であるにも拘わらず、全ての演算器が同期して動作することしかできないため、ｍｏｄｅ１（１６×１６）、ｍｏｄｅ２（１６×８）、ｍｏｄｅ３（８×１６）、及びｍｏｄｅ４（８×８）での探索を行う際に、５１２個全ての演算器が、１６×１６、１６×８、８×１６、又は８×８サイズのブロック１個に占有されてしまう。そのため、処理対象であるマクロブロックペア（ＭＢ−ｐａｉｒ）が細分化されてブロックの個数が増えるにつれ、１マクロブロックペアの探索にかかるサイクル数が大きくなってしまう。この、Ｈ．２６４のブロック分割に伴うサイクル数の増加が、ＲＣＳＡ構成の問題点となる。 The RCSA configuration retains reusable pixels when performing a continuous point search, thereby greatly reducing the amount of pixel value transferred from the SWRAM and TB buffer compared to the SIMD configuration. However, H. The problem with the RCSA configuration is that it does not support block division unique to H.264. That is, since all the arithmetic units can only operate synchronously even though the parallelism is 512, mode1 (16 × 16), mode2 (16 × 8), mode3 (8 × 16), and When performing a search in mode 4 (8 × 8), all 512 arithmetic units are occupied by one block of 16 × 16, 16 × 8, 8 × 16, or 8 × 8 size. Therefore, as the macroblock pair (MB-pair) to be processed is subdivided and the number of blocks increases, the number of cycles required to search for one macroblock pair increases. This H.H. The increase in the number of cycles accompanying H.264 block division becomes a problem of the RCSA configuration.

本発明は、ＳＩＭＤ構成及びＲＣＳＡ構成における上述の問題点を解決するために成されたものであり、ＳＩＭＤ構成と比べて画素値転送量を削減でき、しかも、ＲＣＳＡ構成の問題（Ｈ．２６４のブロック分割に起因するサイクル数の増加）も回避可能な、画像処理装置を得ることを目的とする。 The present invention has been made in order to solve the above-described problems in the SIMD configuration and the RCSA configuration, and can reduce the pixel value transfer amount as compared with the SIMD configuration. An object is to obtain an image processing apparatus that can avoid an increase in the number of cycles due to block division).

第１の発明に係る画像処理装置は、画像の画素値に基づいて評価値を演算するための複数の演算素子が行列状に配設されたアレイを備え、前記アレイは、それぞれが所定数の前記演算素子を含む複数のサブブロックに分割されており、前記複数のサブブロックの各々は、自サブブロックと、自サブブロックに隣接する隣接サブブロックとを接続するか否かを選択可能な選択手段を有しており、処理すべき画像のサイズに応じて前記選択手段の設定を切り換えることによって、前記アレイ内に、一又は複数のサブブロックを含む一又は複数のブロックを設定可能であることを特徴とする。 An image processing apparatus according to a first aspect of the present invention includes an array in which a plurality of calculation elements for calculating an evaluation value based on pixel values of an image are arranged in a matrix, and each of the arrays has a predetermined number. The sub-block is divided into a plurality of sub-blocks including the arithmetic element, and each of the plurality of sub-blocks is selectable to select whether or not to connect the sub-block and the adjacent sub-block adjacent to the sub-block. And having one or more blocks including one or more sub-blocks in the array by switching the setting of the selection unit according to the size of the image to be processed. It is characterized by.

第２の発明に係る画像処理装置は、第１の発明に係る画像処理装置において特に、前記アレイ内に複数のブロックが設定されている場合、前記複数のブロックの各々は他のブロックとは独立に動作可能であることを特徴とする。 The image processing apparatus according to the second invention is the image processing apparatus according to the first invention, particularly when a plurality of blocks are set in the array, each of the plurality of blocks is independent of other blocks. It is possible to operate.

第３の発明に係る画像処理装置は、第１又は第２の発明に係る画像処理装置において特に、前記サブブロックは、複数の前記演算素子を有する第１ユニットと、前記第１ユニット内の前記演算素子によって演算される又は演算された画素値を保持可能な複数のレジスタを有する第２ユニットとを有しており、前記選択手段は、自サブブロックの第１ユニットへの入力として、自サブブロックの第２ユニット及び隣接サブブロックの第１ユニットの一方を選択する選択手段と、自サブブロックの第２ユニットへの入力として、自サブブロックの第１ユニット及び隣接サブブロックの第２ユニットの一方を選択する選択手段とを含むことを特徴とする。 An image processing apparatus according to a third aspect of the invention is the image processing apparatus according to the first or second aspect of the invention, in particular, the sub-block includes a first unit having a plurality of arithmetic elements and the first unit in the first unit. And a second unit having a plurality of registers capable of holding pixel values calculated or calculated by the calculation element, and the selecting means uses the sub-block as an input to the first unit of the sub-block. Selection means for selecting one of the second unit of the block and the first unit of the adjacent sub-block, and as an input to the second unit of the own sub-block, the first unit of the own sub-block and the second unit of the adjacent sub-block And selecting means for selecting one.

第４の発明に係る画像処理装置は、第３の発明に係る画像処理装置において特に、複数の前記サブブロックが接続されることにより、一の前記ブロック内に複数の第１ユニットと複数の第２ユニットとが含まれる場合、前記複数の第１ユニットのうちの一部の第１ユニットを、他の第１ユニット内の演算素子によって演算される又は演算された画素値を保持するためのレジスタとして使用可能であることを特徴とする。 An image processing apparatus according to a fourth invention is the image processing apparatus according to the third invention, in particular, by connecting a plurality of the sub-blocks, so that a plurality of first units and a plurality of first blocks are included in one block. When two units are included, a register for holding a pixel value calculated or calculated by a calculation element in another first unit for a part of the plurality of first units. It can be used as a feature.

第５の発明に係る画像処理装置は、第１〜第４のいずれか一つの発明に係る画像処理装置において特に、前記アレイにロードされている画像部分に対して所定方向に隣接する箇所の画像部分の画素値を保持可能な記憶部をさらに備え、画像の評価位置を前記所定方向にシフトする際、前記記憶部に保持されている画素値が前記記憶部から前記アレイに入力されることを特徴とする。 An image processing apparatus according to a fifth aspect of the invention is an image processing apparatus according to any one of the first to fourth aspects of the invention, in particular, an image of a location adjacent to the image portion loaded in the array in a predetermined direction. A storage unit capable of holding a pixel value of a portion; and when the evaluation position of the image is shifted in the predetermined direction, the pixel value held in the storage unit is input from the storage unit to the array. Features.

第６の発明に係る画像処理装置は、第５の発明に係る画像処理装置において特に、前記選択手段は、自サブブロックへの入力として、隣接サブブロック及び前記記憶部の一方を選択する選択手段を含むことを特徴とする。 The image processing apparatus according to a sixth aspect of the present invention is the image processing apparatus according to the fifth aspect of the invention, in particular, the selection means selects one of the adjacent sub-block and the storage unit as an input to the own sub-block. It is characterized by including.

第７の発明に係る画像処理装置は、第１〜第６のいずれか一つの発明に係る画像処理装置において特に、前記サブブロックは、自サブブロック内の複数の前記演算素子によって演算された評価値を加算する加算器群を有しており、前記加算器群は、連続する行の評価値を加算するための、フレーム画像対応の加算器群と、隔行の評価値を加算するための、フィールド画像対応の加算器群とを含むことを特徴とする。 An image processing apparatus according to a seventh invention is the image processing apparatus according to any one of the first to sixth inventions, wherein the sub-block is evaluated by the plurality of arithmetic elements in the sub-block. An adder group for adding values, the adder group for adding the evaluation values of successive rows, and for adding the evaluation value of every other row; And an adder group corresponding to the field image.

第１〜第９の発明に係る画像処理装置によれば、アレイは複数のサブブロックに分割されている。そして、処理すべき画像のサイズに応じて選択手段の設定を切り換えることによって、アレイ内に一又は複数のサブブロックを含む一又は複数のブロックが設定される。そのため、処理対象であるマクロブロックペアが細分化されてブロックの個数が増えたとしても、アレイ内に設定された複数のブロックを同時に処理できるため、１マクロブロックペアの探索にかかるサイクル数が増大することを回避できる。 According to the image processing apparatus according to the first to ninth inventions, the array is divided into a plurality of sub-blocks. Then, by switching the setting of the selection unit according to the size of the image to be processed, one or a plurality of blocks including one or a plurality of sub blocks are set in the array. For this reason, even if the number of blocks increases as the macro block pair to be processed increases, the number of cycles required to search for one macro block pair increases because multiple blocks set in the array can be processed simultaneously. Can be avoided.

特に第２の発明に係る画像処理装置によれば、複数のブロックの各々は他のブロックとは独立に動作可能であるため、１マクロブロックペア内の複数のブロックを並列に処理することができる。その結果、１マクロブロックペアの探索にかかるサイクル数が増大することを回避できる。 In particular, according to the image processing apparatus of the second invention, each of the plurality of blocks can operate independently of the other blocks, so that a plurality of blocks in one macroblock pair can be processed in parallel. . As a result, an increase in the number of cycles for searching for one macroblock pair can be avoided.

特に第３の発明に係る画像処理装置によれば、サブブロックは、第１ユニット内の演算素子によって演算される又は演算された画素値を保持可能な複数のレジスタを有する第２ユニットを有している。従って、第２ユニットに保持されている画素値は再利用可能であるため、連続点探索を行う際に、バッファからサブブロックへの画素値の転送量を削減することができる。また、選択手段の設定によって、自サブブロックと隣接サブブロックとを接続しない場合には、自サブブロック内でリング状パスを形成することができる。一方、自サブブロックと隣接サブブロックとを接続する場合には、自サブブロックと隣接サブブロックとの間で、第１ユニット同士及び第２ユニット同士を連結することができる。 In particular, according to the image processing apparatus of the third invention, the sub-block has the second unit having a plurality of registers that can be calculated by the calculation elements in the first unit or that can store the calculated pixel values. ing. Accordingly, since the pixel value held in the second unit can be reused, the transfer amount of the pixel value from the buffer to the sub-block can be reduced when performing the continuous point search. Further, when the sub-block and the adjacent sub-block are not connected due to the setting of the selection means, a ring-shaped path can be formed in the self-sub-block. On the other hand, when connecting the own subblock and the adjacent subblock, the first units and the second units can be connected between the own subblock and the adjacent subblock.

特に第４の発明に係る画像処理装置によれば、複数の第１ユニットのうちの一部を第２ユニットと同等に扱うことにより、中間ロードが不要な最大探索範囲を拡大することができる。 In particular, according to the image processing apparatus of the fourth invention, a maximum search range that does not require an intermediate load can be expanded by treating a part of the plurality of first units in the same manner as the second unit.

特に第５の発明に係る画像処理装置によれば、アレイにロードされている画像部分に隣接する箇所の画像部分の画素値を記憶部に保持しておくことにより、ＦＳ（Full Search）としてスネークサーチを実行することが可能となる。 In particular, according to the image processing apparatus of the fifth invention, the pixel value of the image portion adjacent to the image portion loaded in the array is held in the storage unit, so that a snake (FS) is obtained as a FS (Full Search). A search can be executed.

特に第６の発明に係る画像処理装置によれば、選択手段の設定によって、自サブブロックと隣接サブブロックとを接続しない場合には、自サブブロックと記憶部とを接続するパスを形成することができる。一方、自サブブロックと隣接サブブロックとを接続する場合には、自サブブロックと隣接サブブロックとの間で、第１ユニット同士及び第２ユニット同士を連結することができる。 In particular, according to the image processing apparatus of the sixth aspect of the present invention, when the sub-block and the adjacent sub-block are not connected by the setting of the selection unit, a path connecting the sub-block and the storage unit is formed. Can do. On the other hand, when connecting the own subblock and the adjacent subblock, the first units and the second units can be connected between the own subblock and the adjacent subblock.

特に第７の発明に係る画像処理装置によれば、フレーム画像及びフィールド画像の双方に対応することが可能となり、プログレッシブ方式及びインタレース方式の双方を扱うことが可能となる。また、フレーム画像に関する評価値演算と、フィールド画像に関する評価値演算とを同時に実行することが可能となる。 In particular, the image processing apparatus according to the seventh aspect of the invention can handle both frame images and field images, and can handle both progressive and interlace methods. Further, it is possible to simultaneously execute the evaluation value calculation for the frame image and the evaluation value calculation for the field image.

以下、本発明の実施の形態について、図面を用いて詳細に説明する。なお、異なる図面において同一の符号を付した要素は、同一又は相応する要素を示すものとする。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. In addition, the element which attached | subjected the same code | symbol in different drawing shall show the same or corresponding element.

本発明に係る画像処理装置の全体構成を図１に示す。本発明に係る画像処理装置は、Ｈ．２６４対応のＩＭＥコア１（ＩＭＥ：Integer Motion Estimation）として構成されている。ＩＭＥコア１は、参照画像用のバッファであるＳＷＲＡＭ２（SWRAM：Search Window RAM）と、符号化対象画像用のバッファであるＴＢバッファ３（ＴＢ：Template Block）と、画像回転処理部（ｃｒｏｓｓｐａｔｈ）４と、ＲＲＳＡ構成のアレイ５と、コントローラ６とを備えて構成されている。これらのモジュール群をＭＥコア内に搭載すれば、本発明に係るＲＲＳＡは支障なく動作する。 FIG. 1 shows the overall configuration of an image processing apparatus according to the present invention. An image processing apparatus according to the present invention is described in H.264. It is configured as an H.264 compatible IME core 1 (IME: Integer Motion Estimation). The IME core 1 includes a reference image buffer SWRAM 2 (SWRAM: Search Window RAM), a coding target image buffer TB buffer 3 (TB: Template Block), and an image rotation processing unit (cross path). 4, an RRSA-configured array 5, and a controller 6. If these module groups are mounted in the ME core, the RRSA according to the present invention operates without any problem.

ＩＭＥコア１では、横８画素×縦８画素（以下「８×８」と表記する）、８×１、１×８サイズの画素ブロックを１サイクルで読み出すことが可能な２ｐｏｒｔのＳＲＡＭが、ＳＷＲＡＭ２として搭載されている。また、ＳＷＲＡＭ２は、縦方向及び横方向への１／２画素間引き出力が可能である。ＳＷＲＡＭ２から読み出し可能な画像ブロックを図２に示す。間引きなしの８×８画素ブロック、横方向に１／２画素間引きの１６×８画素ブロック、縦方向に１／２画素間引きの８×１６ブロック、縦横両方向に１／２画素間引きの１６×１６ブロックを、いずれも１サイクルで読み出すことが可能である。 In the IME core 1, a 2-port SRAM capable of reading out a pixel block of 8 × 1 (8 × 8), 8 × 1, and 1 × 8 size in one cycle is SWRAM2. It is installed as. Further, the SWRAM 2 can output a half pixel thinned out in the vertical direction and the horizontal direction. An image block readable from the SWRAM 2 is shown in FIG. 8 × 8 pixel block without decimation, 16 × 8 pixel block with 1/2 pixel decimation in the horizontal direction, 8 × 16 block with 1/2 pixel decimation in the vertical direction, 16 × 16 with 1/2 pixel decimation in both vertical and horizontal directions All blocks can be read in one cycle.

ＴＢバッファ３は、符号化対象であるマクロブロックペアの画像データを格納可能な、５１２画素分のレジスタファイル（若しくはＳＲＡＭ）である。ＴＢバッファ３は、保持している画像データから、８×８、８×１、１×８サイズの画素ブロックを１サイクルで出力することができる。 The TB buffer 3 is a 512 pixel register file (or SRAM) capable of storing image data of a macroblock pair to be encoded. The TB buffer 3 can output 8 × 8, 8 × 1, and 1 × 8 size pixel blocks in one cycle from the stored image data.

画像回転処理部４は、後述する縦方向ＤＳ（Directional Search）を実現するために、ＳＷＲＡＭ２から出力された画像を回転してデータパスへ受け渡すためのモジュールである。画像を回転した場合にアレイ５に入力される画素の配置を図３に示す。元の画像が反時計方向に９０°回転されていることが分かる。 The image rotation processing unit 4 is a module for rotating an image output from the SWRAM 2 and delivering it to the data path in order to realize a vertical direction DS (Directional Search) described later. The arrangement of pixels input to the array 5 when the image is rotated is shown in FIG. It can be seen that the original image has been rotated 90 ° counterclockwise.

コントローラ６は、アレイ５を含むＩＭＥコア１内の各モジュールを制御するためのモジュールである。コントローラ６からの制御信号に基づいて、アレイ５は様々な探索動作を実行する。 The controller 6 is a module for controlling each module in the IME core 1 including the array 5. Based on the control signal from the controller 6, the array 5 performs various search operations.

以下、ＲＲＳＡ構成のアレイ５について詳細に説明する。アレイ５の全体構成を図４に示す。アレイ５は複数個（図４に示した例では８個）のサブブロックシストリックアレイ（以下単に「サブブロック」と称す）ＳＢＳＡ０〜ＳＢＳＡ７に分割されている。サブブロックＳＢＳＡ０〜ＳＢＳＡ７は、それぞれプロセッシングユニットＰＵ０〜ＰＵ７とシフトレジスタユニットＳＲＵ０〜ＳＲＵ７とを備えて構成されている。以下、サブブロックＳＢＳＡ０〜ＳＢＳＡ７を総称する場合は「サブブロックＳＢＳＡ」と、プロセッシングユニットＰＵ０〜ＰＵ７を総称する場合は「プロセッシングユニットＰＵ」と、シフトレジスタユニットＳＲＵ０〜ＳＲＵ７を総称する場合は「シフトレジスタユニットＳＲＵ」と、それぞれ称する。プロセッシングユニットＰＵは、複数の演算素子を有するユニットである。シフトレジスタユニットＳＲＵは、プロセッシングユニットＰＵ内の演算素子によって演算される又は演算された画素値を保持するための複数のシフトレジスタ素子を有するユニットである。 Hereinafter, the array 5 having the RRSA configuration will be described in detail. The entire configuration of the array 5 is shown in FIG. The array 5 is divided into a plurality (eight in the example shown in FIG. 4) of sub-block systolic arrays (hereinafter simply referred to as “sub-blocks”) SBSA0 to SBSA7. The sub-blocks SBSA0 to SBSA7 are configured to include processing units PU0 to PU7 and shift register units SRU0 to SRU7, respectively. Hereinafter, when subblocks SBSA0 to SBSA7 are collectively referred to as "subblock SBSA", when processing units PU0 to PU7 are collectively referred to as "processing unit PU", and when shift register units SRU0 to SRU7 are collectively referred to as "shift register" These are referred to as “unit SRU”. The processing unit PU is a unit having a plurality of arithmetic elements. The shift register unit SRU is a unit having a plurality of shift register elements for holding pixel values calculated by or calculated by the calculation elements in the processing unit PU.

サブブロックＳＢＳＡは、８×８単位の評価値を効率的に演算するためのシストリックアレイモジュールである。評価値にはＳＡＤ（Sum of Absolute Difference）を使用するものと想定して評価値演算部を構成しているが、その他の評価値を用いることも可能である。サブブロックＳＢＳＡは、８×８並列の１個のプロセッシングユニットＰＵと、８×８並列の１個のシフトレジスタユニットＳＲＵとを備えて構成されている。 The sub-block SBSA is a systolic array module for efficiently calculating an evaluation value of 8 × 8 units. The evaluation value calculation unit is configured on the assumption that SAD (Sum of Absolute Difference) is used as the evaluation value, but other evaluation values can also be used. The sub-block SBSA includes one 8 × 8 parallel processing unit PU and one 8 × 8 parallel shift register unit SRU.

サブブロックＳＢＳＡには、自サブブロックの外部からの入力として、（Ｉ１）ＳＷＲＡＭ２からのプロセッシングユニットＰＵへの入力、（Ｉ２）ＳＷＲＡＭ２からのシフトレジスタユニットＳＲＵへの入力、（Ｉ３）横隣接サブブロックＳＢＳＡのプロセッシングユニットＰＵからのプロセッシングユニットＰＵへの入力、（Ｉ４）横隣接サブブロックＳＢＳＡのシフトレジスタユニットＳＲＵからのシフトレジスタユニットＳＲＵへの入力、（Ｉ５）下隣接サブブロックＳＢＳＡのプロセッシングユニットＰＵからのプロセッシングユニットＰＵへの入力、（Ｉ６）下隣接サブブロックＳＢＳＡのシフトレジスタユニットＳＲＵからのシフトレジスタユニットＳＲＵへの入力、（Ｉ７）記憶部７からのプロセッシングユニットＰＵへの入力、及び（Ｉ８）記憶部７からのシフトレジスタユニットＳＲＵへの入力の、合計８個の入力が存在する。 In the sub-block SBSA, (I1) input from the SWRAM2 to the processing unit PU, (I2) input to the shift register unit SRU from the SWRAM2, (I3) laterally adjacent subblock Input to processing unit PU from processing unit PU of SBSA, (I4) Input to shift register unit SRU from shift register unit SRU of laterally adjacent sub-block SBSA, (I5) From processing unit PU of lower adjacent sub-block SBSA Input to the processing unit PU, (I6) input to the shift register unit SRU from the shift register unit SRU of the lower adjacent sub-block SBSA, (I7) to the processing unit PU from the storage unit 7 Force, and (I8) of the input to the shift register unit SRU from the storage unit 7, a total of eight input is present.

それぞれの入力画素サイズは、（Ｉ１）については８×８画素、（Ｉ２）については８×８画素、（Ｉ３）については１×８画素、（Ｉ４）については１×８画素、（Ｉ５）については８×１画素、（Ｉ６）については８×１画素、（Ｉ７）については８×１画素、（Ｉ８）については８×１画素である。 The respective input pixel sizes are 8 × 8 pixels for (I1), 8 × 8 pixels for (I2), 1 × 8 pixels for (I3), 1 × 8 pixels for (I4), (I5) Is 8 × 1 pixels, (I6) is 8 × 1 pixels, (I7) is 8 × 1 pixels, and (I8) is 8 × 1 pixels.

また、サブブロックＳＢＳＡには、内部接続として、プロセッシングユニットＰＵとシフトレジスタユニットＳＲＵとの間に１×８画素の双方向入出力パスが存在する。このパスはリング状になっており、図４に示した例では、シフトレジスタユニットＳＲＵの右出力はプロセッシングユニットＰＵの左入力に、シフトレジスタユニットＳＲＵの左出力はプロセッシングユニットＰＵの右入力に、プロセッシングユニットＰＵの右出力はシフトレジスタユニットＳＲＵの左入力に、プロセッシングユニットＰＵの左出力はシフトレジスタユニットＳＲＵの右入力に、それぞれ接続されている。 Further, in the sub-block SBSA, a bidirectional input / output path of 1 × 8 pixels exists as an internal connection between the processing unit PU and the shift register unit SRU. This path is ring-shaped. In the example shown in FIG. 4, the right output of the shift register unit SRU is the left input of the processing unit PU, the left output of the shift register unit SRU is the right input of the processing unit PU, The right output of the processing unit PU is connected to the left input of the shift register unit SRU, and the left output of the processing unit PU is connected to the right input of the shift register unit SRU.

また、サブブロックＳＢＳＡは、サブブロック外部への画素出力として、（Ｏ１）プロセッシングユニットＰＵから横隣接サブブロックＳＢＳＡのプロセッシングユニットＰＵへの出力、（Ｏ２）シフトレジスタユニットＳＲＵから横隣接サブブロックＳＢＳＡのシフトレジスタユニットＳＲＵへの出力、（Ｏ３）プロセッシングユニットＰＵから上隣接サブブロックＳＢＳＡのプロセッシングユニットＰＵへの出力、及び（Ｏ４）シフトレジスタユニットＳＲＵから上隣接サブブロックＳＢＳＡのシフトレジスタユニットＳＲＵへの出力の、合計４個の出力が存在する。 The sub-block SBSA outputs (O1) an output from the processing unit PU to the processing unit PU of the horizontally adjacent sub-block SBSA, and (O2) a pixel output to the outside of the sub-block. Output to shift register unit SRU, (O3) Output from processing unit PU to processing unit PU of upper adjacent subblock SBSA, and (O4) Output from shift register unit SRU to shift register unit SRU of upper adjacent subblock SBSA There are a total of four outputs.

それぞれの出力画素サイズは、（Ｏ１）については１×８画素、（Ｏ２）については１×８画素、（Ｏ３）については８×１画素、（Ｏ４）については８×１画素である。 The respective output pixel sizes are 1 × 8 pixels for (O1), 1 × 8 pixels for (O2), 8 × 1 pixels for (O3), and 8 × 1 pixels for (O4).

また、サブブロックＳＢＳＡは、評価値（この例ではＳＡＤ）の演算結果をアレイ５の外部へ出力する。一つのサブブロックＳＢＳＡから出力されるＳＡＤは、後述の図６に示すように、フレーム画像対応のFrame_4×4_SADが４本、フィールド画像対応のField_4×4_SADが４本である。８個のサブブロックＳＢＳＡ０〜ＳＢＳＡ７からのこれらの演算結果の足しこみ方によって、例えば、１６×３２のマクロブロックペアでの探索中に同じ探索点でのｍｏｄｅ１〜ｍｏｄｅ４（１６×１６、１６×８、８×１６、８×８）のＳＡＤを演算するといった副探索手法が実現可能となる。 The sub-block SBSA outputs the calculation result of the evaluation value (SAD in this example) to the outside of the array 5. As shown in FIG. 6 to be described later, the SAD output from one sub-block SBSA has four Frame_4 × 4_SAD corresponding to frame images and four Field_4 × 4_SAD corresponding to field images. Depending on how these operation results are added from the eight sub-blocks SBSA0 to SBSA7, for example, mode1 to mode4 (16 × 16, 16 × 8) at the same search point during a search with a 16 × 32 macroblock pair. , 8 × 16, 8 × 8) SAD search method can be realized.

横隣接サブブロックＳＢＳＡからの外部入力と、プロセッシングユニットＰＵ−シフトレジスタユニットＳＲＵ間の内部接続とは、動作状態によって選択的に切り換えられる。同様に、下隣接サブブロックＳＢＳＡからの外部入力と、記憶部７からの外部入力とは、動作状態によって選択的に切り換えられる。 The external input from the laterally adjacent sub-block SBSA and the internal connection between the processing unit PU and the shift register unit SRU are selectively switched depending on the operation state. Similarly, the external input from the lower adjacent sub-block SBSA and the external input from the storage unit 7 are selectively switched depending on the operation state.

サブブロックＳＢＳＡの内部構成を図５に示す。また、プロセッシングユニットＰＵにおける評価値演算部分（演算素子ＰＥ及び加算器部分）の内部構成を図６に示す。また、演算素子ＰＥ単体の内部構成を図７に示す。 FIG. 5 shows an internal configuration of the sub-block SBSA. FIG. 6 shows an internal configuration of the evaluation value calculation part (the calculation element PE and the adder part) in the processing unit PU. FIG. 7 shows the internal configuration of the arithmetic element PE alone.

＜プロセッシングユニットＰＵ＞
図５〜７を参照して、プロセッシングユニットＰＵは、８×８画素サイズのデータバッファと、画素１点分の評価値演算を行う演算素子ＰＥが６４（＝８×８）個と、評価値を探索状態に応じて足しこむ加算器１２，１３とを備えて構成されている。 <Processing unit PU>
Referring to FIGS. 5 to 7, the processing unit PU includes an 8 × 8 pixel size data buffer, 64 (= 8 × 8) computing elements PE that perform evaluation value calculation for one pixel, and evaluation values Are added to each other according to the search state.

また、図５を参照して、プロセッシングユニットＰＵは、ＰＥマトリクスの各行毎にマルチプレクサ１０Ａを備えている。マルチプレクサ１０Ａは、自サブブロックＳＢＳＡと、自サブブロックＳＢＳＡに隣接する隣接サブブロックＳＢＳＡとを接続するか否かを選択するための選択手段である。さらに具体的には、マルチプレクサ１０Ａは、自サブブロックＳＢＳＡのプロセッシングユニットＰＵへの入力として、自サブブロックＳＢＳＡのシフトレジスタユニットＳＲＵ、及び隣接サブブロックＳＢＳＡのプロセッシングユニットＰＵの一方を選択するための選択手段である。 Referring to FIG. 5, the processing unit PU includes a multiplexer 10A for each row of the PE matrix. The multiplexer 10A is selection means for selecting whether or not to connect the own subblock SBSA and the adjacent subblock SBSA adjacent to the own subblock SBSA. More specifically, the multiplexer 10A selects one of the shift register unit SRU of its own subblock SBSA and the processing unit PU of its adjacent subblock SBSA as an input to the processing unit PU of its own subblock SBSA. Means.

同様に、図５を参照して、シフトレジスタユニットＳＲＵは、ＳＲＥマトリクスの各行毎にマルチプレクサ１０Ｂを備えている。マルチプレクサ１０Ｂは、自サブブロックＳＢＳＡと、自サブブロックＳＢＳＡに隣接する隣接サブブロックＳＢＳＡとを接続するか否かを選択するための選択手段である。さらに具体的には、マルチプレクサ１０Ｂは、自サブブロックＳＢＳＡのシフトレジスタユニットＳＲＵへの入力として、自サブブロックＳＢＳＡのプロセッシングユニットＰＵ、及び隣接サブブロックＳＢＳＡのシフトレジスタユニットＳＲＵの一方を選択するための選択手段である。 Similarly, referring to FIG. 5, shift register unit SRU includes multiplexer 10B for each row of the SRE matrix. The multiplexer 10B is selection means for selecting whether or not to connect the own subblock SBSA and the adjacent subblock SBSA adjacent to the own subblock SBSA. More specifically, the multiplexer 10B selects one of the processing unit PU of the own subblock SBSA and the shift register unit SRU of the adjacent subblock SBSA as an input to the shift register unit SRU of the own subblock SBSA. It is a selection means.

処理すべき画像のサイズに応じて、つまりマクロブロックペア（１６×３２）、ｍｏｄｅ１（１６×１６）、ｍｏｄｅ２（１６×８）、ｍｏｄｅ３（８×１６）、ｍｏｄｅ４（８×８）の別に応じて、マルチプレクサ１０Ａ，１０Ｂの設定が切り換えられる。また、実行する探索モードに応じて、つまりＦＳモード、ＤＳモード、ＲＢＭモードの別に応じて、マルチプレクサ１０Ａ，１０Ｂの設定が切り換えられる。詳細については後述する。 Depending on the size of the image to be processed, that is, depending on the macroblock pair (16 × 32), mode1 (16 × 16), mode2 (16 × 8), mode3 (8 × 16), and mode4 (8 × 8) Thus, the settings of the multiplexers 10A and 10B are switched. Further, the settings of the multiplexers 10A and 10B are switched according to the search mode to be executed, that is, depending on the FS mode, the DS mode, and the RBM mode. Details will be described later.

図６に示すように、プロセッシングユニットＰＵによって演算される評価値は、４×４画素単位のＳＡＤである。フレーム画像及びフィールド画像の双方に対応するために、フレーム画像対応の４個の加算器１２と、フィールド画像対応の４個の加算器１３とが設けられている。加算器１２は、連続する４行（合計１６個）の演算素子ＰＥのＳＡＤを演算して出力する。加算器１３は、隔行（１行とばし）の４行（合計１６個）の演算素子ＰＥのＳＡＤを演算して出力する。 As shown in FIG. 6, the evaluation value calculated by the processing unit PU is SAD in units of 4 × 4 pixels. In order to support both frame images and field images, four adders 12 corresponding to frame images and four adders 13 corresponding to field images are provided. The adder 12 calculates and outputs the SAD of four consecutive rows (16 in total) of the computing elements PE. The adder 13 calculates and outputs SADs of four rows (16 rows in total) of computing elements PE in every other row (one skip).

ここで、ｙ方向（縦方向）に１／２に間引かれた画素値がＳＷＲＡＭ２からアレイ５に入力されている場合には、Frame_4×4_SADは、ｙ方向間引きなしのフィールド画像のＳＡＤ（又はｙ方向に１／２に間引かれたフレーム画像のＳＡＤ）に相当し、また、Field_4×4_SADは、ｙ方向に１／２に間引かれたフィールド画像のＳＡＤに相当する。 Here, when the pixel value thinned by 1/2 in the y direction (vertical direction) is input from the SWRAM 2 to the array 5, Frame_4 × 4_SAD is the SAD (or the SAD of the field image without thinning in the y direction) (or field_4 × 4_SAD corresponds to the SAD of the field image thinned by 1/2 in the y direction.

垂直方向ＤＳ（直線探索）を行う場合には、入力画像の回転に応じてField_4×4_SADが演算される。そのため、垂直方向ＤＳにおいてもField_4×4_SADを利用した副探索が可能である。 When performing vertical direction DS (straight line search), Field_4 × 4_SAD is calculated according to the rotation of the input image. Therefore, a sub search using Field_4 × 4_SAD is possible even in the vertical direction DS.

プロセッシングユニットＰＵは、８×８画素サイズのデータバッファを有している（図７の“register”）。このバッファは、参照画像ＳＷの画素値及び符号化対象画像ＴＢの画素値を、８×８サイズで保持可能である。初期ロードとして、参照画像ＳＷ用のバッファは、ＳＷＲＡＭ２から８×８画素の画素値の供給を受け、これを１サイクルで保持することができる。また、プロセッシングユニットＰＵは、内部に保持している参照画像ＳＷの画素値を、左右に１画素分シフトすることが可能である。このシフトによって溢れた画素値は、同一サブブロックＳＢＳＡ内のシフトレジスタユニットＳＲＵ、又は横隣接サブブロックＳＢＳＡ内のプロセッシングユニットＰＵへと供給することが可能である。評価値演算部を使用しない（つまり演算素子ＰＥからの出力を破棄する）という設定も可能であり、この場合、プロセッシングユニットＰＵは、８×８サイズのシフトレジスタとして動作する。さらに、プロセッシングユニットＰＵは、下方向からの画素値の供給を受けて、内部に保持している画素値を縦方向に１画素分シフトすることができる。このシフトによって溢れた画素値は、上隣接サブブロックＳＢＳＡ内のプロセッシングユニットＰＵへと供給することが可能である。 The processing unit PU has a data buffer of 8 × 8 pixel size (“register” in FIG. 7). This buffer can hold the pixel value of the reference image SW and the pixel value of the encoding target image TB in an 8 × 8 size. As an initial load, the buffer for the reference image SW is supplied with a pixel value of 8 × 8 pixels from the SWRAM 2 and can hold it in one cycle. Further, the processing unit PU can shift the pixel value of the reference image SW held therein by one pixel left and right. The pixel value overflowed by this shift can be supplied to the shift register unit SRU in the same sub-block SBSA or the processing unit PU in the laterally adjacent sub-block SBSA. Setting that the evaluation value calculation unit is not used (that is, the output from the calculation element PE is discarded) is also possible. In this case, the processing unit PU operates as an 8 × 8 size shift register. Furthermore, the processing unit PU can receive the pixel value supplied from the lower direction and shift the pixel value held therein by one pixel in the vertical direction. The pixel value overflowed by this shift can be supplied to the processing unit PU in the upper adjacent sub-block SBSA.

＜シフトレジスタユニットＳＲＵ＞
シフトレジスタユニットＳＲＵは、８×８画素の合計６４画素分の画素値バッファであり、８×８個のシフトレジスタ素子ＳＲＥを備えて構成されている。シフトレジスタ素子ＳＲＥ単体の内部構成を図８に示す。初期ロードとして、シフトレジスタユニットＳＲＵはＳＷＲＡＭ２から８×８画素の画素値の供給を受け、これを１サイクルで保持することができる。シフトレジスタユニットＳＲＵは、内部に保持している画素値を左右に１画素分シフトすることができ、このシフトによって溢れた画素値は、同一サブブロックＳＢＳＡ内のプロセッシングユニットＰＵ、又は横隣接サブブロックＳＢＳＡ内のシフトレジスタユニットＳＲＵへと供給することが可能である。また、シフトレジスタユニットＳＲＵは、下方向からの画素値の供給を受けて、保持している画素値を縦方向に１画素分シフトすることができる。このシフトによって溢れた画素値は、上隣接サブブロックＳＢＳＡ内のシフトレジスタユニットＳＲＵへと供給することが可能である。 <Shift register unit SRU>
The shift register unit SRU is a pixel value buffer for a total of 64 pixels of 8 × 8 pixels, and includes 8 × 8 shift register elements SRE. FIG. 8 shows an internal configuration of the shift register element SRE alone. As an initial load, the shift register unit SRU receives a pixel value of 8 × 8 pixels from the SWRAM 2 and can hold it in one cycle. The shift register unit SRU can shift the pixel value held inside by one pixel to the left and right, and the pixel value overflowed by this shift is the processing unit PU in the same sub-block SBSA or the horizontal adjacent sub-block. It is possible to supply to the shift register unit SRU in the SBSA. Further, the shift register unit SRU can receive the pixel value supplied from the lower direction and shift the held pixel value by one pixel in the vertical direction. The pixel value overflowed by this shift can be supplied to the shift register unit SRU in the upper adjacent sub-block SBSA.

図４を参照して、記憶部（ＲＥＧ＿ＶＳ）７は、ＦＳ（Full Search)動作時の縦方向探索を実現するためのデータバッファ用レジスタである。記憶部７の内部構成を図９に示す。記憶部７への外部からの入力としては、ＳＷＲＡＭ２からの８×１画素の画素値入力が存在する。また、記憶部７からの外部への入力としては、各サブブロックＳＢＳＡのプロセッシングユニットＰＵへの８×１画素の画素値出力、及び各サブブロックＳＢＳＡのシフトレジスタユニットＳＲＵへの８×１画素の画素値出力が存在する。記憶部７は、８×１画素単位の画素値バッファ１５を１６個（プロセッシングユニットＰＵ用に８個、シフトレジスタユニットＳＲＵ用に８個）備えて構成されている。ＳＷＲＡＭ２から入力された画素値データは、いずれかの画素値バッファ１５に入力され、保持される。記憶部７は、画素値バッファ１５に保持している全ての画素値データを、対応するサブブロックＳＢＳＡ（プロセッシングユニットＰＵ及びシフトレジスタユニットＳＲＵ）に同時に出力することができる。 Referring to FIG. 4, storage unit (REG_VS) 7 is a data buffer register for realizing a vertical search during an FS (Full Search) operation. The internal configuration of the storage unit 7 is shown in FIG. As an external input to the storage unit 7, there is an 8 × 1 pixel value input from the SWRAM 2. Further, as external inputs from the storage unit 7, an 8 × 1 pixel value output to the processing unit PU of each sub-block SBSA and an 8 × 1 pixel output to the shift register unit SRU of each sub-block SBSA There is a pixel value output. The storage unit 7 includes 16 pixel value buffers 15 in units of 8 × 1 pixels (eight for the processing unit PU and eight for the shift register unit SRU). Pixel value data input from the SWRAM 2 is input to one of the pixel value buffers 15 and held. The storage unit 7 can simultaneously output all the pixel value data held in the pixel value buffer 15 to the corresponding sub-block SBSA (processing unit PU and shift register unit SRU).

次に、本実施の形態に係る画像処理装置の動作について説明する。 Next, the operation of the image processing apparatus according to this embodiment will be described.

以下では、実現可能な各探索手法における全体動作について説明する。本実施の形態に係る画像処理装置で実行可能な探索手法には、（１）ＦＳ（Full Search）、（２）ＤＳ（Directional Search）、及び（３）ＲＢＭ（Random Block Matching）が含まれる。また、ＲＲＳＡ構成では、探索手法だけでなく、探索を実行するブロックサイズによっても構成を変える。対応可能なブロックサイズには、１６×３２、１６×１６、１６×８、８×１６、及び８×８が含まれる。 Below, the whole operation | movement in each search method which can be implement | achieved is demonstrated. Search methods that can be executed by the image processing apparatus according to the present embodiment include (1) FS (Full Search), (2) DS (Directional Search), and (3) RBM (Random Block Matching). In the RRSA configuration, the configuration is changed not only by the search method but also by the block size for executing the search. Supported block sizes include 16 × 32, 16 × 16, 16 × 8, 8 × 16, and 8 × 8.

サブブロックＳＢＳＡの基本動作として、初期ロード、水平シフト動作、及び垂直シフト動作が定義されている。ＲＲＳＡ構成では、探索手法に応じてこれらの基本動作を適宜に組み合わせることによって、様々な探索手法を同一のアーキテクチャによって実現している。 An initial load, a horizontal shift operation, and a vertical shift operation are defined as basic operations of the sub-block SBSA. In the RRSA configuration, various search methods are realized by the same architecture by appropriately combining these basic operations according to the search method.

＜初期ロード＞
プロセッシングユニットＰＵ、シフトレジスタユニットＳＲＵ、及び記憶部７が初期的なロードで保持する画素値についての詳細を図１０，１１に示す。図１０は画像を回転させない場合について示しており、図１１は画像を回転させる場合について示している。 <Initial load>
Details of the pixel values that the processing unit PU, the shift register unit SRU, and the storage unit 7 hold in the initial load are shown in FIGS. FIG. 10 shows the case where the image is not rotated, and FIG. 11 shows the case where the image is rotated.

図１０を参照して、プロセッシングユニットＰＵには、８×８の画像部分の画素値（図１０のａ〜ｈ）が、１サイクルでロードされる。シフトレジスタユニットＳＲＵには、プロセッシングユニットＰＵにロードされる画像部分に右隣接する８×８の画像部分の画素値（図１０の“０”〜“７”）が、１サイクルでロードされる。記憶部７には、プロセッシングユニットＰＵにロードされる画像部分に下隣接する８×１の画像部分の画素値（図１０の“ｕ”）が、１サイクルでロードされる。また、記憶部７には、シフトレジスタユニットＳＲＵにロードされる画像部分に下隣接する８×１の画像部分の画素値（図１０の“ｖ”）が、１サイクルロードされる。 Referring to FIG. 10, pixel values (a to h in FIG. 10) of an 8 × 8 image portion are loaded into processing unit PU in one cycle. The shift register unit SRU is loaded with the pixel values (“0” to “7” in FIG. 10) of the 8 × 8 image portion right adjacent to the image portion to be loaded into the processing unit PU in one cycle. The pixel value (“u” in FIG. 10) of the 8 × 1 image portion that is adjacent to the lower portion of the image portion loaded in the processing unit PU is loaded into the storage unit 7 in one cycle. In addition, the pixel value (“v” in FIG. 10) of the 8 × 1 image portion that is adjacent to the lower portion of the image portion loaded in the shift register unit SRU is loaded into the storage unit 7 for one cycle.

図９を参照して、初期ロードで記憶部７内の１６個全ての画素値バッファ１５に画素値を同時にロードすることは不可能であるため、各画素値バッファ１５毎に順に画素値をロードさせていく必要がある。 Referring to FIG. 9, since it is impossible to load pixel values to all 16 pixel value buffers 15 in the storage unit 7 at the initial load at the same time, the pixel values are sequentially loaded for each pixel value buffer 15. It is necessary to let them.

図１１を参照して、プロセッシングユニットＰＵには、８×８の画像部分の画素値（図１１のａ〜ｈ）が、画像回転処理部４（図１参照）によって反時計回りに９０°回転されつつ、１サイクルでロードされる。また、シフトレジスタユニットＳＲＵには、プロセッシングユニットＰＵにロードされる画像部分に下隣接する８×８の画像部分の画素値（図１０の“０”〜“７”）が、画像回転処理部４によって反時計回りに９０°回転されつつ、１サイクルでロードされる。 Referring to FIG. 11, in the processing unit PU, the pixel values (a to h in FIG. 11) of the 8 × 8 image portion are rotated 90 ° counterclockwise by the image rotation processing unit 4 (see FIG. 1). However, it is loaded in one cycle. The shift register unit SRU stores the pixel values (“0” to “7” in FIG. 10) of the 8 × 8 image portion that is adjacent to the image portion loaded in the processing unit PU. Is loaded in one cycle while being rotated 90 ° counterclockwise.

＜左シフト動作＞
左シフト動作は、現在の保持状態から左方向に画素値をシフトする動作であり、直線連続点探索の基本となる動作である。探索としては、左から右に向かって連続点探索を行う動作となる。左シフト動作を図１２に示す。シフトレジスタユニットＳＲＵの左端列８画素分の画素値（図１２の“０”）が、プロセッシングユニットＰＵに供給されて、プロセッシングユニットＰＵの右端列８画素に保持される。プロセッシングユニットＰＵから溢れた画素値（図１２の“ａ”）は、シフトレジスタユニットＳＲＵの右端列８画素に保持させることができる。 <Left shift operation>
The left shift operation is an operation that shifts the pixel value in the left direction from the current holding state, and is a basic operation of the linear continuous point search. The search is an operation of performing a continuous point search from left to right. FIG. 12 shows the left shift operation. The pixel values (“0” in FIG. 12) for the 8 pixels in the left end column of the shift register unit SRU are supplied to the processing unit PU and held in the 8 pixels in the right end column of the processing unit PU. The pixel value overflowing from the processing unit PU (“a” in FIG. 12) can be held in 8 pixels in the right end column of the shift register unit SRU.

＜右シフト動作＞
右シフト動作は、現在の保持状態から右方向に画素値をシフトする動作であり、ＦＳ動作としてスネークサーチ（図１５参照）を実現するために必要となる動作である。探索としては、右から左に向かって連続点探索を行う動作となる。右シフト動作を図１３に示す。プロセッシングユニットＰＵの右端列８画素分の画素値（図１３の“ｈ”）が、シフトレジスタユニットＳＲＵに供給されて、シフトレジスタユニットＳＲＵの左端列８画素に保持される。シフトレジスタユニットＳＲＵから溢れた画素値（図１３の“７”）は、プロセッシングユニットＰＵの左端列８画素に保持させることができる。 <Right shift operation>
The right shift operation is an operation for shifting the pixel value in the right direction from the current holding state, and is an operation necessary for realizing the snake search (see FIG. 15) as the FS operation. As the search, a continuous point search is performed from right to left. The right shift operation is shown in FIG. Pixel values corresponding to 8 pixels in the right end column of the processing unit PU ("h" in FIG. 13) are supplied to the shift register unit SRU and held in the 8 pixels in the left end column of the shift register unit SRU. The pixel value overflowing from the shift register unit SRU (“7” in FIG. 13) can be held in the leftmost column of 8 pixels of the processing unit PU.

＜上シフト動作＞
上シフト動作は、現在の保持状態から上方向に画素値をシフトする動作であり、ＦＳ動作としてスネークサーチ（図１５参照）を実現するために必要となる動作である。探索としては、上から下に向かって１画素分だけシフトする動作となる。上シフト動作を図１４に示す。記憶部７（又は下隣接サブブロックＳＢＳＡ内のプロセッシングユニットＰＵ）に保持されている８×１画素の画素値（図１４の“ｕ”）が、プロセッシングユニットＰＵに供給されて、プロセッシングユニットＰＵの下端行８画素に保持される。プロセッシングユニットＰＵから溢れた上端行８画素分の画素値は、上隣接サブブロックＳＢＳＡ内のプロセッシングユニットＰＵの下端行８画素に保持されるか、破棄される。 <Upshift operation>
The upward shift operation is an operation for shifting the pixel value upward from the current holding state, and is an operation necessary for realizing a snake search (see FIG. 15) as the FS operation. The search is an operation of shifting by one pixel from top to bottom. FIG. 14 shows the upshift operation. The pixel value of 8 × 1 pixels (“u” in FIG. 14) held in the storage unit 7 (or the processing unit PU in the lower adjacent sub-block SBSA) is supplied to the processing unit PU, and the processing unit PU It is held at the bottom row of 8 pixels. The pixel values for the upper 8 rows of pixels overflowing from the processing unit PU are held in the lower 8 rows of the processing unit PU in the upper adjacent sub-block SBSA or discarded.

また、記憶部７（又は下隣接サブブロックＳＢＳＡ内のシフトレジスタユニットＳＲＵ）に保持されている８×１画素の画素値（図１４の“ｖ”）が、シフトレジスタユニットＳＲＵに供給されて、シフトレジスタユニットＳＲＵの下端行８画素に保持される。シフトレジスタユニットＳＲＵから溢れた上端行８画素分の画素値は、上隣接サブブロックＳＢＳＡ内のシフトレジスタユニットＳＲＵの下端行８画素に保持されるか、破棄される。 Further, the 8 × 1 pixel value (“v” in FIG. 14) held in the storage unit 7 (or the shift register unit SRU in the lower adjacent sub-block SBSA) is supplied to the shift register unit SRU, It is held in the bottom row of 8 pixels of the shift register unit SRU. The pixel values for the upper 8 rows of pixels overflowing from the shift register unit SRU are held in the lower row 8 pixels of the shift register unit SRU in the upper adjacent sub-block SBSA or discarded.

＜ＦＳ（Full Search）＞
ＦＳは一般的な探索手法であり、探索範囲として指定した矩形領域を網羅的に探索する手法である。ＲＲＳＡ構成では、ＦＳをスネークサーチと呼ばれる方法で実現する。スネークサーチにおける矩形領域内の探索順を図１５に示す。また、ＦＳ時におけるサブブロックＳＢＳＡ内の内部結線状態を図１６〜２０に示す。図１６はマクロブロックペア（１６×３２）に対応し、図１７はｍｏｄｅ１（１６×１６）に対応し、図１８はｍｏｄｅ２（１６×８）に対応し、図１９はｍｏｄｅ３（８×１６）に対応し、図２０はｍｏｄｅ４（８×８）に対応する。 <FS (Full Search)>
FS is a general search method, and is a method for exhaustively searching a rectangular area designated as a search range. In the RRSA configuration, FS is realized by a method called snake search. The search order in the rectangular area in the snake search is shown in FIG. Moreover, the internal connection state in the sub-block SBSA at the time of FS is shown in FIGS. 16 corresponds to a macroblock pair (16 × 32), FIG. 17 corresponds to mode 1 (16 × 16), FIG. 18 corresponds to mode 2 (16 × 8), and FIG. 19 corresponds to mode 3 (8 × 16). FIG. 20 corresponds to mode 4 (8 × 8).

ブロックサイズがマクロブロックペア、ｍｏｄｅ１、ｍｏｄｅ２である場合は、図１６〜１８に示すように、横隣接サブブロックＳＢＳＡ間で接続パスが形成され、一方、ｍｏｄｅ３、ｍｏｄｅ４である場合は、図１９，２０に示すように、横隣接サブブロックＳＢＳＡ間で接続パスは形成されない。 When the block size is a macroblock pair, mode 1 and mode 2, as shown in FIGS. 16 to 18, a connection path is formed between horizontally adjacent sub-blocks SBSA, while when the mode size is mode 3 and mode 4, FIG. As shown at 20, no connection path is formed between the horizontally adjacent sub-blocks SBSA.

下方向からの入力もブロックサイズに応じて決定され、ブロックサイズがマクロブロックペア、ｍｏｄｅ１、ｍｏｄｅ３である場合は、図１６，１７，１９に示すように、縦隣接サブブロックＳＢＳＡ間で接続パスが形成され、一方、ｍｏｄｅ２、ｍｏｄｅ４である場合は、図１８，２０に示すように、記憶部７との間で接続パスが形成される。但し、図１６，１７，１９においても、一番下に位置するサブブロックＳＢＳＡは、記憶部７との間で接続パスを形成している。 The input from the lower direction is also determined according to the block size. When the block size is a macroblock pair, mode1, and mode3, as shown in FIGS. 16, 17, and 19, the connection path is between vertical adjacent subblocks SBSA. On the other hand, in the case of mode 2 and mode 4, a connection path is formed with the storage unit 7 as shown in FIGS. However, also in FIGS. 16, 17, and 19, the sub-block SBSA located at the bottom forms a connection path with the storage unit 7.

図１５に示すように、スネークサーチでは、左シフト→上シフト→右シフト→上シフト→左シフト→・・・の順で動作が繰り返される。プロセッシングユニットＰＵ及びシフトレジスタユニットＳＲＵが保持している画素値を最大限利用し、水平方向の最大探索範囲を、ｍｏｄｅ３、ｍｏｄｅ４の場合は±４以下とし、マクロブロックペア、ｍｏｄｅ１、ｍｏｄｅ２の場合は±８以下とすれば、ＦＳにおいてプロセッシングユニットＰＵ及びシフトレジスタユニットＳＲＵの双方ともに中間ロードは必要ない。但し、ストールを生じさせずにＦＳを完了させるためには、上シフトを実行してから次の上シフトを実行するまでの間に、記憶部７に次の行の画素値をロードしておく必要がある。 As shown in FIG. 15, in the snake search, the operation is repeated in the order of left shift → upshift → right shift → upshift → left shift →. The pixel values held by the processing unit PU and the shift register unit SRU are used to the maximum, the maximum horizontal search range is ± 4 or less for mode 3 and mode 4, and in the case of a macroblock pair, mode 1 and mode 2 If it is ± 8 or less, neither an intermediate load is required for both the processing unit PU and the shift register unit SRU in the FS. However, in order to complete the FS without causing a stall, the pixel value of the next row is loaded into the storage unit 7 between the time when the upper shift is executed and the time when the next upper shift is executed. There is a need.

また、本実施の形態に係る画像処理装置では、横隣接サブブロックＳＢＳＡ間での接続を利用して、２個のプロセッシングユニットＰＵと２個のシフトレジスタユニットＳＲＵとを水平方向で直列に接続することが可能である。従って、ｍｏｄｅ３又はｍｏｄｅ４で要求並列度が２５６以下（つまり使用するサブブロックＳＢＳＡが４個以下）である場合には、一方のプロセッシングユニットＰＵをシフトレジスタとして使用することで、１個のプロセッシングユニットＰＵと３個のシフトレジスタユニットＳＲＵとの直列接続として使用することができる。この場合は、中間ロードなしでの最大探索範囲を±１２まで拡大することが可能となる。 Further, in the image processing apparatus according to the present embodiment, two processing units PU and two shift register units SRU are connected in series in the horizontal direction using the connection between the horizontally adjacent sub-blocks SBSA. It is possible. Accordingly, when the required parallelism is 256 or less (that is, 4 or less sub-blocks SBSA are used) in mode 3 or mode 4, one processing unit PU is used by using one processing unit PU as a shift register. And three shift register units SRU can be used as a serial connection. In this case, the maximum search range without an intermediate load can be expanded to ± 12.

＜ＤＳ（Directional Search）＞
ＤＳは、水平又は垂直に直線探索を行う探索手法である。ＤＳ時におけるサブブロックＳＢＳＡ内の内部結線状態を図２１〜２５に示す。図２１はマクロブロックペア（１６×３２）に対応し、図２２はｍｏｄｅ１（１６×１６）に対応し、図２３はｍｏｄｅ２（１６×８）に対応し、図２４はｍｏｄｅ３（８×１６）に対応し、図２５はｍｏｄｅ４（８×８）に対応する。 <DS (Directional Search)>
DS is a search method that performs a straight line search horizontally or vertically. The internal connection state in the sub-block SBSA at the time of DS is shown in FIGS. 21 corresponds to a macroblock pair (16 × 32), FIG. 22 corresponds to mode 1 (16 × 16), FIG. 23 corresponds to mode 2 (16 × 8), and FIG. 24 corresponds to mode 3 (8 × 16). FIG. 25 corresponds to mode 4 (8 × 8).

垂直方向探索である場合には、ＳＷＲＡＭ２からデータパスに入力される画素値は、画像回転処理部４（図１参照）によって反時計回りに９０°回転される。アレイ５のサイズが１６×３２画素であり、回転後のマクロブロックペアをアレイ５に保持することができないため、マクロブロックペアに関しては垂直方向探索は不可能である。但し、アレイ５の水平サイズを３２画素以上に拡大することで、マクロブロックペアに関する垂直方向探索も可能となる。 In the case of the vertical search, the pixel value input to the data path from the SWRAM 2 is rotated 90 ° counterclockwise by the image rotation processing unit 4 (see FIG. 1). Since the size of the array 5 is 16 × 32 pixels and the rotated macroblock pair cannot be held in the array 5, the vertical search is not possible for the macroblock pair. However, by expanding the horizontal size of the array 5 to 32 pixels or more, it is possible to perform a vertical search for a macroblock pair.

ＤＳの探索は、左シフト（図１２）のみを用いて行う。ブロックサイズがマクロブロックペア、ｍｏｄｅ１、ｍｏｄｅ２である場合は、図２１〜２３に示すように、横隣接サブブロックＳＢＳＡ間で接続パスが形成され、一方、ｍｏｄｅ３、ｍｏｄｅ４である場合は、図２４，２５に示すように、横隣接サブブロックＳＢＳＡ間で接続パスは形成されない。 The DS search is performed using only the left shift (FIG. 12). When the block size is a macroblock pair, mode 1 and mode 2, as shown in FIGS. 21 to 23, a connection path is formed between horizontally adjacent sub-blocks SBSA, while when the mode size is mode 3 and mode 4, FIG. As shown in FIG. 25, no connection path is formed between the horizontally adjacent sub-blocks SBSA.

シフトレジスタユニットＳＲＵに関しては、８点分の探索を行うごとに１回の中間ロードを行う必要がある。中間ロードの際には、ＳＷＲＡＭ２から８×８画素分の画素値が１サイクルでシフトレジスタユニットＳＲＵに供給される。 With respect to the shift register unit SRU, it is necessary to perform one intermediate load every time eight points are searched. At the time of intermediate loading, pixel values for 8 × 8 pixels are supplied from the SWRAM 2 to the shift register unit SRU in one cycle.

＜ＲＢＭ（Random Block Matching＞
ＲＢＭは、単一点のみを探索する探索手法である。単一点の探索に関しては特にシフト動作を行う必要はなく、プロセッシングユニットＰＵに初期ロードを行うだけで、その点の評価値が自動的に求まる。 <RBM (Random Block Matching>
RBM is a search method for searching only a single point. With respect to the search for a single point, it is not necessary to perform a shift operation, and the evaluation value at that point is automatically obtained by simply performing an initial load on the processing unit PU.

ＦＳ、ＤＳ、ＲＢＭにおいて、ブロックサイズがマクロブロックペアである場合は、８個のサブブロックＳＢＳＡが接続されて、１６×３２の１個のブロックが構成されている。この場合、図２６の（Ａ）に示すように、アレイ５内には１個のブロックのみを構成可能である。 In FS, DS, and RBM, when the block size is a macroblock pair, eight sub-blocks SBSA are connected to form one 16 × 32 block. In this case, as shown in FIG. 26A, only one block can be configured in the array 5.

同様に、ｍｏｄｅ１の場合は、４個（横２個×縦２個）のサブブロックＳＢＳＡが接続されて、１６×１６のブロックが構成されている。この場合、図２６の（Ｂ）に示すように、アレイ５内には最大２個のブロックを構成可能である。２個のブロックの各々は、他のブロックとは独立して動作可能である。 Similarly, in the case of mode 1, four (2 horizontal × 2 vertical) sub-blocks SBSA are connected to form a 16 × 16 block. In this case, as shown in FIG. 26B, a maximum of two blocks can be configured in the array 5. Each of the two blocks can operate independently of the other blocks.

同様に、ｍｏｄｅ２の場合は、横２個のサブブロックＳＢＳＡが接続されて、１６×８のブロックが構成されている。この場合、図２６の（Ｃ）に示すように、アレイ５内には最大４個のブロックを構成可能である。４個のブロックの各々は、他のブロックとは独立して動作可能である。 Similarly, in the case of mode 2, two horizontal sub-blocks SBSA are connected to form a 16 × 8 block. In this case, as shown in FIG. 26C, a maximum of four blocks can be configured in the array 5. Each of the four blocks can operate independently of the other blocks.

同様に、ｍｏｄｅ３の場合は、縦２個のサブブロックＳＢＳＡが接続されて、８×１６のブロックが構成されている。この場合、図２６の（Ｄ）に示すように、アレイ５内には最大４個のブロックを構成可能である。４個のブロックの各々は、他のブロックとは独立して動作可能である。 Similarly, in the case of mode 3, two vertical sub-blocks SBSA are connected to form an 8 × 16 block. In this case, as shown in FIG. 26D, a maximum of four blocks can be configured in the array 5. Each of the four blocks can operate independently of the other blocks.

同様に、ｍｏｄｅ４の場合は、１個のサブブロックＳＢＳＡによって、８×８のブロックが構成されている。この場合、図２７の（Ｅ）に示すように、アレイ５内には最大８個のブロック（サブブロックに等しい）を構成可能である。８個のブロックの各々は、他のブロックとは独立して動作可能である。 Similarly, in the case of mode 4, an 8 × 8 block is configured by one sub-block SBSA. In this case, as shown in FIG. 27E, a maximum of 8 blocks (equivalent to sub-blocks) can be configured in the array 5. Each of the eight blocks can operate independently of the other blocks.

また、図２７の（Ｆ），（Ｇ）に示すように、２種類のｍｏｄｅを同時に実行することも可能である。さらに、図２７の（Ｈ）に示すように、３種類のｍｏｄｅを同時に実行することも可能である。このような場合であっても、複数のブロックの各々は、他のブロックとは独立して動作可能である。また、複数種類のブロックサイズの同時使用と同様に、複数種類の探索手法を同時に実行することも可能である。例えば、図２６の（Ｂ）において、上側の１６×１６のブロックではＦＳ動作を行い、これと同時に、下側の１６×１６のブロックではＤＳ動作を行うことが可能である。 In addition, as shown in FIGS. 27F and 27G, two types of modes can be executed simultaneously. Further, as shown in FIG. 27H, three types of modes can be executed simultaneously. Even in such a case, each of the plurality of blocks can operate independently of the other blocks. Further, similarly to the simultaneous use of a plurality of types of block sizes, a plurality of types of search methods can be executed simultaneously. For example, in FIG. 26B, it is possible to perform the FS operation in the upper 16 × 16 block and simultaneously perform the DS operation in the lower 16 × 16 block.

＜まとめ＞
このように本実施の形態に係る画像処理装置によれば、アレイ５は複数のサブブロックＳＢＳＡ０〜ＳＢＳＡ７に分割されている。そして、処理すべき画像のサイズに応じてマルチプレクサ１０Ａ，１０Ｂ，１１Ａ，１１Ｂの設定を切り換えることによって、アレイ５内に一又は複数のサブブロックＳＢＳＡを含む一又は複数のブロックが設定される。そのため、処理対象であるマクロブロックペアが細分化されてブロックの個数が増えたとしても、アレイ５内に設定された複数のブロックを同時に処理できるため、１マクロブロックペアの探索にかかるサイクル数が増大することを回避できる。また、複数のブロックの各々は他のブロックとは独立に動作可能であるため、１マクロブロックペア内の複数のブロックを並列に処理することができる。その結果、１マクロブロックペアの探索にかかるサイクル数が増大することを回避できる。 <Summary>
Thus, according to the image processing apparatus according to the present embodiment, array 5 is divided into a plurality of sub-blocks SBSA0 to SBSA7. Then, by switching the settings of the multiplexers 10A, 10B, 11A, and 11B according to the size of the image to be processed, one or a plurality of blocks including one or a plurality of sub-blocks SBSA are set in the array 5. For this reason, even if the number of blocks is increased by subdividing the macro block pair to be processed, a plurality of blocks set in the array 5 can be processed at the same time, so the number of cycles required for searching for one macro block pair is reduced. The increase can be avoided. Further, since each of the plurality of blocks can operate independently of the other blocks, a plurality of blocks in one macroblock pair can be processed in parallel. As a result, an increase in the number of cycles for searching for one macroblock pair can be avoided.

数値計算によって本発明の効果を検証した結果を図２８〜３１に示す。サイクル数及びＳＲＡＭからのデータ転送量に関して、本発明に係るＲＲＳＡ構成を、従来のＳＩＭＤ構成及びＲＣＳＡ構成と比較している。 The result of having verified the effect of this invention by numerical calculation is shown to FIGS. Regarding the number of cycles and the amount of data transferred from the SRAM, the RRSA configuration according to the present invention is compared with the conventional SIMD configuration and RCSA configuration.

各ｍｏｄｅ毎に、サイクル数及びデータ転送量ともにＳＩＭＤ構成での値を１００％として正規化を行っている。図２８は、ＦＳ動作を探索範囲±４×±４（計９１点）として行った場合の見積もりである。図２９は、ＦＳ動作を探索範囲±８×±８（計２８９点）として行った場合の見積もりである。図３０は、ＤＳ動作を探索範囲連続３３点として行った場合の見積もりである。図３１は、ＲＢＭ動作を探索範囲ランダム１６点として行った場合の見積もりである。図２８〜３１を参照すると、全ての場合において、本発明に係るＲＲＳＡ構成は、従来のＳＩＭＤ構成及びＲＣＳＡ構成と比べて、サイクル数及びデータ転送量を削減できていることが分かる。 For each mode, normalization is performed with the number of cycles and the data transfer amount set to 100% in the SIMD configuration. FIG. 28 is an estimate when the FS operation is performed in the search range ± 4 × ± 4 (91 points in total). FIG. 29 is an estimate when the FS operation is performed with a search range of ± 8 × ± 8 (total of 289 points). FIG. 30 is an estimate when the DS operation is performed with 33 consecutive search ranges. FIG. 31 shows an estimate when the RBM operation is performed with a search range random of 16 points. Referring to FIGS. 28 to 31, in all cases, it can be seen that the RRSA configuration according to the present invention can reduce the number of cycles and the amount of data transfer compared to the conventional SIMD configuration and RCSA configuration.

本発明に係る画像処理装置の全体構成を示す図である。1 is a diagram illustrating an overall configuration of an image processing apparatus according to the present invention. ＳＷＲＡＭから読み出し可能な画像ブロックを示す図である。It is a figure which shows the image block which can be read from SWRAM. 画像を回転した場合にアレイに入力される画素の配置を示す図である。It is a figure which shows arrangement | positioning of the pixel input into an array when an image is rotated. アレイの全体構成を示す図である。It is a figure which shows the whole structure of an array. サブブロックの内部構成を示す図である。It is a figure which shows the internal structure of a subblock. プロセッシングユニットにおける評価値演算部分の内部構成を示す図である。It is a figure which shows the internal structure of the evaluation value calculation part in a processing unit. 演算素子単体の内部構成を示す図である。It is a figure which shows the internal structure of a calculation element single-piece | unit. シフトレジスタ素子単体の内部構成を示す図である。It is a figure which shows the internal structure of a shift register element single-piece | unit. 記憶部の内部構成を示す図である。It is a figure which shows the internal structure of a memory | storage part. プロセッシングユニット、シフトレジスタユニット、及び記憶部が初期ロードで保持する画素値についての詳細を示す図である。It is a figure which shows the detail about the pixel value which a processing unit, a shift register unit, and a memory | storage part hold | maintain by initial load. プロセッシングユニット、シフトレジスタユニット、及び記憶部が初期ロードで保持する画素値についての詳細を示す図である。It is a figure which shows the detail about the pixel value which a processing unit, a shift register unit, and a memory | storage part hold | maintain by initial load. 左シフト動作を示す図である。It is a figure which shows left shift operation | movement. 右シフト動作を示す図である。It is a figure which shows a right shift operation | movement. 上シフト動作を示す図である。It is a figure which shows an up shift operation | movement. スネークサーチにおける矩形領域内の探索順を示す図である。It is a figure which shows the search order in the rectangular area in a snake search. ＦＳ時におけるサブブロック内の内部結線状態を示す図である。It is a figure which shows the internal connection state in the subblock at the time of FS. ＦＳ時におけるサブブロック内の内部結線状態を示す図である。It is a figure which shows the internal connection state in the subblock at the time of FS. ＦＳ時におけるサブブロック内の内部結線状態を示す図である。It is a figure which shows the internal connection state in the subblock at the time of FS. ＦＳ時におけるサブブロック内の内部結線状態を示す図である。It is a figure which shows the internal connection state in the subblock at the time of FS. ＦＳ時におけるサブブロック内の内部結線状態を示す図である。It is a figure which shows the internal connection state in the subblock at the time of FS. ＤＳ時におけるサブブロック内の内部結線状態を示す図である。It is a figure which shows the internal connection state in the subblock at the time of DS. ＤＳ時におけるサブブロック内の内部結線状態を示す図である。It is a figure which shows the internal connection state in the subblock at the time of DS. ＤＳ時におけるサブブロック内の内部結線状態を示す図である。It is a figure which shows the internal connection state in the subblock at the time of DS. ＤＳ時におけるサブブロック内の内部結線状態を示す図である。It is a figure which shows the internal connection state in the subblock at the time of DS. ＤＳ時におけるサブブロック内の内部結線状態を示す図である。It is a figure which shows the internal connection state in the subblock at the time of DS. アレイ内におけるブロックの設定を示す図である。It is a figure which shows the setting of the block in an array. アレイ内におけるブロックの設定を示す図である。It is a figure which shows the setting of the block in an array. 本発明の効果を検証した結果を示す図である。It is a figure which shows the result of having verified the effect of this invention. 本発明の効果を検証した結果を示す図である。It is a figure which shows the result of having verified the effect of this invention. 本発明の効果を検証した結果を示す図である。It is a figure which shows the result of having verified the effect of this invention. 本発明の効果を検証した結果を示す図である。It is a figure which shows the result of having verified the effect of this invention. ＳＩＭＤ構成の例を示す図である。It is a figure which shows the example of a SIMD structure. ＲＣＳＡ構成の例を示す図である。It is a figure which shows the example of a RCSA structure. 演算素子の内部構成を示す図である。It is a figure which shows the internal structure of an arithmetic element. シフトレジスタの内部構成を示す図である。It is a figure which shows the internal structure of a shift register.

Explanation of symbols

１ＩＭＥコア
２ＳＷＲＡＭ
３ＴＢバッファ
４画像回転処理部
５アレイ
７記憶部
１０Ａ，１０Ｂ，１１Ａ，１１Ｂマルチプレクサ 1 IME core 2 SWRAM
3 TB buffer 4 Image rotation processing unit 5 Array 7 Storage unit 10A, 10B, 11A, 11B Multiplexer

Claims

A plurality of computing elements for computing evaluation values based on pixel values of an image are provided in an array,
The array is divided into a plurality of sub-blocks each including a predetermined number of the arithmetic elements,
Each of the plurality of sub-blocks has selection means capable of selecting whether or not to connect the own sub-block and an adjacent sub-block adjacent to the own sub-block,
An image processing apparatus capable of setting one or a plurality of blocks including one or a plurality of sub-blocks in the array by switching the setting of the selection unit according to the size of an image to be processed.

The image processing apparatus according to claim 1, wherein when a plurality of blocks are set in the array, each of the plurality of blocks can operate independently of the other blocks.

The sub-block is
A first unit having a plurality of arithmetic elements;
A second unit having a plurality of registers that can be calculated by the calculation element in the first unit or that can hold the calculated pixel value;
The selection means includes
Selection means for selecting one of the second unit of the own subblock and the first unit of the adjacent subblock as an input to the first unit of the own subblock;
The image processing apparatus according to claim 1, further comprising selection means for selecting one of the first unit of the own subblock and the second unit of the adjacent subblock as an input to the second unit of the own subblock.

When a plurality of first units and a plurality of second units are included in one block by connecting a plurality of the sub-blocks, some first units of the plurality of first units The image processing apparatus according to claim 3, wherein the image processing apparatus can be used as a register for calculating or calculating a pixel value calculated by an arithmetic element in another first unit.

A storage unit capable of holding a pixel value of an image portion at a location adjacent to the image portion loaded in the array in a predetermined direction;
The image processing according to claim 1, wherein when the evaluation position of the image is shifted in the predetermined direction, a pixel value held in the storage unit is input from the storage unit to the array. apparatus.

The selection means includes
The image processing apparatus according to claim 5, further comprising selection means for selecting one of an adjacent sub-block and the storage unit as an input to the own sub-block.

The sub-block has an adder group that adds evaluation values calculated by the plurality of arithmetic elements in the sub-block.
The adder group is:
An adder group corresponding to frame images for adding evaluation values of consecutive rows;
The image processing apparatus according to claim 1, further comprising an adder group corresponding to a field image for adding an evaluation value for every other row.