JP4413052B2

JP4413052B2 - Data flow graph processing apparatus and processing apparatus

Info

Publication number: JP4413052B2
Application number: JP2004086774A
Authority: JP
Inventors: 洋中島; 達夫平松; 誠岡田
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 2004-03-24
Filing date: 2004-03-24
Publication date: 2010-02-10
Anticipated expiration: 2024-03-24
Also published as: JP2005275698A

Description

この発明は、機能の変更が可能なリコンフィギュラブル回路に関し、特にリコンフィギュラブル回路の動作設定に必要なデータフローグラフを処理する技術に関する。 The present invention relates to a reconfigurable circuit whose function can be changed, and more particularly to a technique for processing a data flow graph necessary for setting the operation of the reconfigurable circuit.

近年、アプリケーションに応じてハードウェアの動作を変更可能なリコンフィギュラブルプロセッサの開発が進められている。リコンフィギュラブルプロセッサを実現するためのアーキテクチャとしては、ＤＳＰ(Digital Signal Processor)や、ＦＰＧＡ(Field Programmable Gate Array)を用いる方法が存在する。 In recent years, development of reconfigurable processors capable of changing hardware operations in accordance with applications has been underway. As an architecture for realizing a reconfigurable processor, there are methods using a DSP (Digital Signal Processor) and an FPGA (Field Programmable Gate Array).

ＦＰＧＡ（Field Programmable Gate Array）はＬＳＩ製造後に回路データを書き込んで比較的自由に回路構成を設計することが可能であり、専用ハードウエアの設計に利用されている。ＦＰＧＡは、論理回路の真理値表を格納するためのルックアップテーブル（ＬＵＴ）と出力用のフリップフロップからなる基本セルと、その基本セル間を結ぶプログラマブルな配線リソースとを含む。ＦＰＧＡでは、ＬＵＴに格納するデータと配線データを書き込むことで目的とする論理演算を実現できる。しかし、ＦＰＧＡでＬＳＩを設計した場合、ＡＳＩＣ（Application Specific IC）による設計と比べると、実装面積が非常に大きくなり、コスト高になる。そこで、ＦＰＧＡを動的に再構成することで、回路構成の再利用を図る方法が提案されている（例えば、特許文献１参照。）。
特開平１０−２５６３８３号公報 An FPGA (Field Programmable Gate Array) can design circuit configuration relatively freely by writing circuit data after the LSI is manufactured, and is used for designing dedicated hardware. The FPGA includes a lookup table (LUT) for storing a truth table of a logic circuit, a basic cell composed of an output flip-flop, and a programmable wiring resource connecting the basic cells. In the FPGA, a target logical operation can be realized by writing data stored in the LUT and wiring data. However, when an LSI is designed using an FPGA, the mounting area is very large and the cost is high as compared with an ASIC (Application Specific IC) design. Thus, a method has been proposed in which the circuit configuration is reused by dynamically reconfiguring the FPGA (see, for example, Patent Document 1).
Japanese Patent Laid-Open No. 10-256383

例えば衛星放送では、季節などにより、放送モードを切り替えて画質の調整などを行うこともある。受信機では、放送モードごとに複数の回路を予めハードウェア上に作り込んでおき、放送モードに合わせて選択器で回路を切り替えて受信している。したがって、受信機の他の放送モード用の回路はその間、遊んでいることになる。モード切り替えのように、複数の専用回路を切り替えて使用し、その切り替え間隔が比較的長い場合、複数の専用回路を作り込む代わりに、切り替え時にＬＳＩを瞬時に再構成することにすれば、回路構造をシンプルにして汎用性を高め、同時に実装コストを抑えることができる。このようなニーズに応えるべく、動的に再構成可能なＬＳＩに製造業界の関心が集まっている。特に、携帯電話やＰＤＡ（Personal Data Assistance）などのモバイル端末に搭載されるＬＳＩは小型化が必須であり、ＬＳＩを動的に再構成し、用途に合わせて適宜機能を切り替えることができれば、ＬＳＩの実装面積を抑えることができる。 For example, in satellite broadcasting, image quality may be adjusted by switching broadcast modes depending on the season. In the receiver, a plurality of circuits are built in hardware for each broadcast mode in advance, and the circuit is switched by a selector according to the broadcast mode for reception. Therefore, the other broadcast mode circuits of the receiver are idle during that time. When switching and using multiple dedicated circuits, such as mode switching, and the switching interval is relatively long, instead of creating multiple dedicated circuits, the LSI can be reconfigured instantaneously at the time of switching. The structure can be simplified to improve versatility, and at the same time the mounting cost can be reduced. In order to meet such needs, the manufacturing industry has attracted attention to dynamically reconfigurable LSIs. In particular, LSIs mounted on mobile terminals such as mobile phones and PDAs (Personal Data Assistance) must be downsized, and if LSIs can be dynamically reconfigured and functions can be switched appropriately according to the application, Mounting area can be reduced.

ＦＰＧＡは回路構成の設計自由度が高く、汎用的である反面、全ての基本セル間の接続を可能とするため、多数のスイッチとスイッチのＯＮ／ＯＦＦを制御するための制御回路を含む必要があり、必然的に制御回路の実装面積が大きくなる。また、基本セル間の接続に複雑な配線パターンをとるため、配線が長くなる傾向があり、さらに１本の配線に多くのスイッチが接続される構造のため、遅延が大きくなる。そのため、ＦＰＧＡによるＬＳＩは、試作や実験のために利用されるにとどまることが多く、実装効率、性能、コストなどを考えると、量産には適していない。さらに、ＦＰＧＡでは、多数のＬＵＴ方式の基本セルに構成情報を送る必要があるため、回路のコンフィグレーションにはかなりの時間がかかる。そのため、瞬時に回路構成の切り替えが必要な用途にはＦＰＧＡは適していない。 The FPGA has a high degree of design freedom in circuit configuration and is general-purpose. On the other hand, to enable connection between all the basic cells, it is necessary to include a control circuit for controlling ON / OFF of the switches. This inevitably increases the mounting area of the control circuit. Further, since a complicated wiring pattern is used for the connection between the basic cells, the wiring tends to be long, and the delay increases because of the structure in which many switches are connected to one wiring. For this reason, FPGA based LSIs are often used only for trial manufacture and experiments, and are not suitable for mass production in view of mounting efficiency, performance, cost, and the like. Furthermore, in the FPGA, it is necessary to send configuration information to a large number of basic cells of the LUT method, so that it takes a considerable time to configure the circuit. For this reason, the FPGA is not suitable for applications that require instantaneous switching of the circuit configuration.

それらの課題を解決するため、近年、ＡＬＵ(Arithmetic Logic Unit)と呼ばれる基本演算機能を複数持つ多機能素子を多段に並べたＡＬＵアレイの検討が行われるようになった。ＡＬＵアレイでは、処理が上段から下段の一方向に流れるので、水平方向のＡＬＵを結ぶ配線は基本的には不要である。そのため、ＦＰＧＡと比較して回路規模を小さくすることが可能となる。 In order to solve these problems, in recent years, an ALU array called ALU (Arithmetic Logic Unit) in which multi-functional elements having a plurality of basic arithmetic functions are arranged in multiple stages has been studied. In the ALU array, processing flows in one direction from the upper stage to the lower stage, so that wiring that connects the ALUs in the horizontal direction is basically unnecessary. Therefore, the circuit scale can be reduced as compared with the FPGA.

ＡＬＵアレイでは、コマンドデータによりＡＬＵ回路の演算機能構成と前後段のＡＬＵを接続する接続部の配線が制御され、所期の演算処理を実行することができる。コマンドデータは、一般にＣ言語等の高級プログラム言語で記述されたソースプログラムからデータフローグラフ（ＤＦＧ：Data Flow Graph）を作成し、その情報をもとに作成される。 In the ALU array, the arithmetic function configuration of the ALU circuit and the wiring of the connection part connecting the preceding and succeeding ALUs are controlled by command data, and the intended arithmetic processing can be executed. The command data is generally created based on the data flow graph (DFG: Data Flow Graph) created from a source program written in a high-level program language such as C language.

ＡＬＵは、加算演算やシフト演算などの演算機能をそれぞれ実行する複数の基本演算素子を備えて構成される。ＡＬＵが基本演算素子を多く持つことは、それだけＡＬＵアレイの演算能力を高め、演算処理の汎用性を高めることができる。一方で、ＡＬＵの基本演算素子の数を増やすことは、それだけＡＬＵの回路規模が大きくなるという欠点もある。 The ALU includes a plurality of basic arithmetic elements that respectively perform arithmetic functions such as addition operation and shift operation. An ALU having many basic arithmetic elements can increase the arithmetic performance of the ALU array and increase the versatility of arithmetic processing. On the other hand, increasing the number of basic arithmetic elements of the ALU has a drawback that the circuit scale of the ALU increases accordingly.

例えば、乗算演算機能を実行する基本演算素子をＡＬＵに作りこんだ場合、その基本演算素子の回路規模が大きくなるため、ＡＬＵアレイ全体の回路規模が増大する。一方、乗算演算素子を有しないＡＬＵアレイで乗算演算を実行する場合、加算演算やシフト演算などを組み合わせたＤＦＧを作成して、ＡＬＵアレイにマッピングすることになる。この場合、ＤＦＧが非常に大きくなり、乗算処理に時間がかかり、処理の高速性が損なわれるとともに、ＡＬＵアレイの消費電力が増加する。 For example, when a basic arithmetic element that executes a multiplication operation function is built in an ALU, the circuit scale of the basic arithmetic element increases, and the circuit scale of the entire ALU array increases. On the other hand, when a multiplication operation is executed with an ALU array that does not have a multiplication operation element, a DFG combining an addition operation, a shift operation, and the like is created and mapped to the ALU array. In this case, the DFG becomes very large, the multiplication process takes time, the high-speed processing is impaired, and the power consumption of the ALU array increases.

本発明はこうした状況に鑑みてなされたもので、その目的は、効率よくデータフローグラフを作成し、また効率よくデータフローグラフを実行するリコンフィギュラブル回路および処理装置に関する技術を提供することにある。 The present invention has been made in view of such circumstances, and an object of the present invention is to provide a technique relating to a reconfigurable circuit and a processing apparatus that efficiently create a data flow graph and efficiently execute the data flow graph. .

本発明のある態様は、それぞれが複数の演算機能を選択的に実行可能な論理回路の多段配列と、前段の論理回路の出力と後段の論理回路の入力の接続関係を設定する接続部とを備えたリコンフィギュラブル回路に対して供給するデータフローグラフを処理するデータフローグラフ処理装置であって、処理の動作を記述した動作記述をもとに、演算間の実行順序の依存関係を表現するデータフローグラフを生成する手段と、リコンフィギュラブル回路における基本演算で構成されるデータフローの一群を１つのノードに置換する手段と、置換するデータフローの一群に含まれるノードと同一段に位置するノードに演算が割り当てられていない場合に、同一段に位置するノードを削除する手段と、を備えることを特徴とする。
An aspect of the present invention includes a multi-stage arrangement of logic circuits each capable of selectively executing a plurality of arithmetic functions, and a connection unit that sets a connection relationship between an output of a preceding logic circuit and an input of a succeeding logic circuit. A data flow graph processing apparatus for processing a data flow graph supplied to a reconfigurable circuit provided, and expressing a dependency of execution order between operations on the basis of an operation description describing the operation of the processing A means for generating a data flow graph, a means for replacing a group of data flows composed of basic operations in a reconfigurable circuit with one node, and a node included in the group of data flows to be replaced are located at the same stage Means for deleting a node located in the same stage when no operation is assigned to the node to be operated.

本発明の別の態様は、それぞれが複数の演算機能を選択的に実行可能な論理回路の多段配列と、前段の論理回路の出力と後段の論理回路の入力の接続関係を設定する接続部とを備えたリコンフィギュラブル回路と、前記リコンフィギュラブル回路に対して供給するデータフローグラフを処理するデータフローグラフ処理部とを備え、前記データフローグラフ処理部は、処理の動作を記述した動作記述をもとに、演算間の実行順序の依存関係を表現するデータフローグラフを生成する手段と、リコンフィギュラブル回路における基本演算で構成されるデータフローの一群を１つのノードに置換する手段と、置換するデータフローの一群に含まれるノードと同一段に位置するノードに演算が割り当てられていない場合に、同一段に位置するノードを削除する手段と、を備えることを特徴とする。
Another aspect of the present invention includes a multi-stage arrangement of logic circuits each capable of selectively executing a plurality of arithmetic functions, and a connection unit that sets a connection relationship between an output of the preceding logic circuit and an input of the succeeding logic circuit. And a data flow graph processing unit for processing a data flow graph supplied to the reconfigurable circuit, wherein the data flow graph processing unit describes an operation description describing a processing operation Based on the above, means for generating a data flow graph expressing the dependency of execution order between operations, means for replacing a group of data flows composed of basic operations in a reconfigurable circuit with one node, and A node located in the same stage when no operation is assigned to the node located in the same stage as the node included in the group of data flows to be replaced Characterized in that it comprises means for deleting, the.

なお、以上の構成要素の任意の組み合わせ、本発明の表現を方法、装置、システム、コンピュータプログラムとして表現したものもまた、本発明の態様として有効である。 It should be noted that any combination of the above components and the expression of the present invention expressed as a method, apparatus, system, and computer program are also effective as an aspect of the present invention.

本発明によれば、リコンフィギュラブル回路の動作設定に必要なデータフローグラフを効率的に処理する技術を提供することができる。
ADVANTAGE OF THE INVENTION According to this invention, the technique which processes efficiently the data flow graph required for the operation | movement setting of a reconfigurable circuit can be provided.

図１は、実施の形態に係る処理装置１０の構成図である。処理装置１０は、回路構成を再構成可能とする機能を有する集積回路装置２６を備える。集積回路装置２６は１チップとして構成され、リコンフィギュラブル回路１２、設定部１４、制御部１８、内部状態保持回路２０、出力回路２２、第１フィードバック経路２４、遅延保持回路２７および第２フィードバック経路２９を備える。リコンフィギュラブル回路１２は設定を変更することにより、機能の変更を可能とする。リコンフィギュラブル回路１２は組合せ回路または順序回路等の論理回路として構成される。第１フィードバック経路２４および第２フィードバック経路２９は、フィードバックパスとして機能し、リコンフィギュラブル回路１２の出力を、リコンフィギュラブル回路１２の入力に接続する。 FIG. 1 is a configuration diagram of a processing apparatus 10 according to the embodiment. The processing device 10 includes an integrated circuit device 26 having a function that allows the circuit configuration to be reconfigured. The integrated circuit device 26 is configured as one chip, and includes a reconfigurable circuit 12, a setting unit 14, a control unit 18, an internal state holding circuit 20, an output circuit 22, a first feedback path 24, a delay holding circuit 27, and a second feedback path. 29. The reconfigurable circuit 12 can change the function by changing the setting. The reconfigurable circuit 12 is configured as a logic circuit such as a combinational circuit or a sequential circuit. The first feedback path 24 and the second feedback path 29 function as a feedback path, and connect the output of the reconfigurable circuit 12 to the input of the reconfigurable circuit 12.

リコンフィギュラブル回路１２は、それぞれが複数の演算機能を選択的に実行可能な論理回路の多段配列と、前段の論理回路の出力と後段の論理回路の入力の接続関係を設定可能な接続部とを備える。構造的には、複数の論理回路列の間に、論理回路列間の接続用結線を設定する接続部が設けられる。リコンフィギュラブル回路１２は、複数段に配列された各論理回路の機能、および論理回路間の接続を任意に設定することで、機能の変更を可能とする。本実施の形態における論理回路は、演算機能のそれぞれを実行する基本演算素子の複数個を所定の規則の下で接続する組合せ用結線を有して構成される。 The reconfigurable circuit 12 includes a multi-stage arrangement of logic circuits each capable of selectively executing a plurality of arithmetic functions, and a connection unit capable of setting a connection relationship between the output of the preceding logic circuit and the input of the succeeding logic circuit. Is provided. Structurally, a connection part for setting connection lines between logic circuit strings is provided between the plurality of logic circuit strings. The reconfigurable circuit 12 can change the function by arbitrarily setting the function of each logic circuit arranged in a plurality of stages and the connection between the logic circuits. The logic circuit in the present embodiment is configured to have a combination connection that connects a plurality of basic arithmetic elements that perform each of the arithmetic functions under a predetermined rule.

設定部１４は、リコンフィギュラブル回路１２に所期の回路を構成するための設定データ４０を供給する。設定部１４は、プログラムカウンタのカウント値に基づいて記憶したデータを出力するコマンドメモリとして構成されてもよい。この場合、制御部１８がプログラムカウンタの出力を制御する。この場合、設定データ４０はコマンドメモリから出力されるコマンドデータであってよい。 The setting unit 14 supplies setting data 40 for configuring a desired circuit to the reconfigurable circuit 12. The setting unit 14 may be configured as a command memory that outputs stored data based on the count value of the program counter. In this case, the control unit 18 controls the output of the program counter. In this case, the setting data 40 may be command data output from the command memory.

内部状態保持回路２０は、例えばデータフリップフロップ（ＤＦＦ）などの順序回路として構成され、リコンフィギュラブル回路１２の出力を受け付ける。内部状態保持回路２０は第１フィードバック経路２４に接続されており、リコンフィギュラブル回路１２の出力を直接リコンフィギュラブル回路１２の入力にフィードバックさせる。また内部状態保持回路２０は、リコンフィギュラブル回路１２の出力を遅延保持回路２７に供給する。内部状態保持回路２０は選択器を有し、選択器は、制御部からの選択指示をもとに、第１フィードバック経路２４に送り出すデータ、および遅延保持回路２７に供給するデータを選択する。なお、選択器は、同一のデータを第１フィードバック経路２４および遅延保持回路２７の双方に送り出してもよい。 The internal state holding circuit 20 is configured as a sequential circuit such as a data flip-flop (DFF), for example, and receives the output of the reconfigurable circuit 12. The internal state holding circuit 20 is connected to the first feedback path 24 and feeds back the output of the reconfigurable circuit 12 directly to the input of the reconfigurable circuit 12. The internal state holding circuit 20 supplies the output of the reconfigurable circuit 12 to the delay holding circuit 27. The internal state holding circuit 20 includes a selector, and the selector selects data to be sent to the first feedback path 24 and data to be supplied to the delay holding circuit 27 based on a selection instruction from the control unit. The selector may send the same data to both the first feedback path 24 and the delay holding circuit 27.

遅延保持回路２７はメモリであって、リコンフィギュラブル回路１２から出力される出力データを格納するための複数のＲＡＭにより構成される。遅延保持回路２７は、リコンフィギュラブル回路１２の出力データを任意の時間遅延させる機能をもつ。この例では、遅延保持回路２７が、内部状態保持回路２０から出力されるデータを格納しているが、リコンフィギュラブル回路１２から直接出力されるデータを格納してもよい。遅延保持回路２７は、制御部１８からのＷ／Ｒイネーブル信号およびアドレス信号に基づいて、データの書込／読出を行う。遅延保持回路２７は第２フィードバック経路２９に接続されており、制御部１８からの読出指示に基づいて、所期のタイミングでデータをリコンフィギュラブル回路１２の入力にフィードバックさせる。なお、設定部１４がコマンドメモリとして構成されている場合、コマンドメモリから供給されるコマンドデータで、遅延保持回路２７のデータの書込／読出を行ってもよい。 The delay holding circuit 27 is a memory and includes a plurality of RAMs for storing output data output from the reconfigurable circuit 12. The delay holding circuit 27 has a function of delaying the output data of the reconfigurable circuit 12 for an arbitrary time. In this example, the delay holding circuit 27 stores data output from the internal state holding circuit 20, but data output directly from the reconfigurable circuit 12 may be stored. The delay holding circuit 27 writes / reads data based on the W / R enable signal and address signal from the control unit 18. The delay holding circuit 27 is connected to the second feedback path 29 and feeds back data to the input of the reconfigurable circuit 12 at a predetermined timing based on a read instruction from the control unit 18. If the setting unit 14 is configured as a command memory, the data of the delay holding circuit 27 may be written / read using command data supplied from the command memory.

出力回路２２は、例えばデータフリップフロップ（ＤＦＦ）などの順序回路として構成され、リコンフィギュラブル回路１２から出力されるデータを外部に出力する。この例では、出力回路２２が、遅延保持回路２７から出力されるデータを受け付けているが、リコンフィギュラブル回路１２から直接出力されるデータを受け付けてもよく、また内部状態保持回路２０から出力されるデータを受け付けてもよい。 The output circuit 22 is configured as a sequential circuit such as a data flip-flop (DFF), for example, and outputs data output from the reconfigurable circuit 12 to the outside. In this example, the output circuit 22 receives data output from the delay holding circuit 27, but it may receive data directly output from the reconfigurable circuit 12, and may be output from the internal state holding circuit 20. Data may be accepted.

処理装置１０においては、リコンフィギュラブル回路１２の出力をリコンフィギュラブル回路１２の入力にフィードバックする経路が、第１フィードバック経路２４および第２フィードバック経路２９の２系統存在する。第１フィードバック経路２４は、遅延保持回路２７を介さないために、リコンフィギュラブル回路１２の出力データを高速にフィードバック処理することが可能である。一方、第２フィードバック経路２９は、制御部１８からの指示により所期のタイミングでデータ信号をリコンフィギュラブル回路１２に供給することができる。このように、第１フィードバック経路２４または第２フィードバック経路２９は、リコンフィギュラブル回路１２上に再構成する回路に応じて適宜使い分けられる。 In the processing apparatus 10, there are two systems for feeding back the output of the reconfigurable circuit 12 to the input of the reconfigurable circuit 12, a first feedback path 24 and a second feedback path 29. Since the first feedback path 24 does not go through the delay holding circuit 27, the output data of the reconfigurable circuit 12 can be fed back at high speed. On the other hand, the second feedback path 29 can supply a data signal to the reconfigurable circuit 12 at an expected timing according to an instruction from the control unit 18. As described above, the first feedback path 24 or the second feedback path 29 is appropriately used depending on the circuit to be reconfigured on the reconfigurable circuit 12.

リコンフィギュラブル回路１２は、機能の変更が可能な論理回路を有して構成される。複数の論理回路は、マトリックス状に配置された構造をとってもよい。各論理回路の機能と、論理回路間の接続関係は、設定部１４により供給される設定データ４０に基づいて設定される。また、論理回路内において、基本演算素子同士を接続する組合せ用結線も、設定データ４０に基づいて設定される。設定データ４０は、以下の手順で生成される。 The reconfigurable circuit 12 includes a logic circuit whose function can be changed. The plurality of logic circuits may have a structure arranged in a matrix. The function of each logic circuit and the connection relationship between the logic circuits are set based on setting data 40 supplied by the setting unit 14. In the logic circuit, the combination connection for connecting the basic arithmetic elements to each other is also set based on the setting data 40. The setting data 40 is generated by the following procedure.

集積回路装置２６により実現されるべきプログラム３６が、記憶部３４に保持されている。プログラム３６は、回路における処理の動作を記述した動作記述を示し、信号処理回路または信号処理アルゴリズムなどをＣ言語などの高級言語で記述したものである。コンパイル部３０は、記憶部３４に格納されたプログラム３６をコンパイルし、データフローグラフ（ＤＦＧ）３８に変換して記憶部３４に格納する。データフローグラフ３８は、回路における演算間の実行順序の依存関係を表現し、入力変数および定数の演算の流れをグラフ構造で示したものである。一般に、データフローグラフ３８は、上から下に向かって演算が進むように作成される。 A program 36 to be realized by the integrated circuit device 26 is held in the storage unit 34. The program 36 shows an operation description describing the operation of processing in the circuit, and describes a signal processing circuit or a signal processing algorithm in a high-level language such as C language. The compiling unit 30 compiles the program 36 stored in the storage unit 34, converts it into a data flow graph (DFG) 38, and stores it in the storage unit 34. The data flow graph 38 expresses the dependency of execution order between operations in a circuit, and shows the flow of operations of input variables and constants in a graph structure. In general, the data flow graph 38 is created so that the calculation proceeds from top to bottom.

本実施の形態では、論理回路が複数の基本演算素子を有して、それぞれの演算機能を実行する。一般的に用いる演算としては、加算演算や減算演算など様々なものをあげることができるが、そのうちの一つに乗算演算がある。リコンフィギュラブル回路１２の汎用性という観点からは、各論理回路が乗算を実行する基本演算素子を備えることが好ましいが、実際には乗算用の演算素子は、他の加算演算などの演算素子と比較すると、回路規模が非常に大きい。そのため、論理回路は、乗算用の基本演算素子を持たずに構成され、データフローグラフ３８中の乗算演算は、筆算アルゴリズムやBoothアルゴリズムなどを用いて加算やビットシフトなどで演算できる形態に展開する必要がある。乗算演算の展開は、コンパイル部３０により実行される。 In the present embodiment, the logic circuit has a plurality of basic arithmetic elements and executes respective arithmetic functions. As operations that are generally used, various operations such as addition operations and subtraction operations can be given, and one of them is a multiplication operation. From the viewpoint of versatility of the reconfigurable circuit 12, it is preferable that each logic circuit includes a basic arithmetic element that performs multiplication. In practice, however, the arithmetic element for multiplication is different from other arithmetic elements such as addition operations. In comparison, the circuit scale is very large. For this reason, the logic circuit is configured without a basic arithmetic element for multiplication, and the multiplication operation in the data flow graph 38 is developed into a form that can be calculated by addition, bit shift, or the like using a writing algorithm or Booth algorithm. There is a need. The expansion of the multiplication operation is executed by the compiling unit 30.

データフローグラフ処理部３１は、コンパイル部３０により生成されたデータフローグラフ３８から、所定の規則を有するデータフローの一群を探索する。このデータフローの一群は、演算ノード間に所定の接続関係を有するノードの集合であって、予め記憶部３４において登録されている。データフローグラフ処理部３１は、所定のデータフローの一群を探索して、その一群を構成するノード数よりも少ない数のノードに置換する。このとき、ノード数を減らす観点から、１つのノードに置換することが好ましい。 The data flow graph processing unit 31 searches the data flow graph 38 generated by the compiling unit 30 for a group of data flows having a predetermined rule. A group of data flows is a set of nodes having a predetermined connection relationship between operation nodes, and is registered in the storage unit 34 in advance. The data flow graph processing unit 31 searches for a group of predetermined data flows and replaces them with a number of nodes smaller than the number of nodes constituting the group. At this time, it is preferable to replace with one node from the viewpoint of reducing the number of nodes.

なお、データフローの一群から置換されたノードは、リコンフィギュラブル回路１２の論理回路で処理可能なノードである必要がある。これに対応して、論理回路は、置換されたノードを処理するために、自身のもつ複数の基本演算素子を所定の順序で組み合わせるための組合せ用結線を有して構成される。これにより、論理回路は基本演算素子の数を増やすことなく、組合せ用結線をもつことで、複数の基本演算素子により実行される新たな演算機能をもつことができる。言い換えると、論理回路における複数の基本演算素子の可能な組合せに応じて、所定の規則を有するデータフローの一群を予め登録し、データフローグラフ処理部３１が、１つの論理回路において実行可能なデータフローの一群を探索して、１つのノードに置換するすることが可能となる。 Note that the node replaced from the group of data flows needs to be a node that can be processed by the logic circuit of the reconfigurable circuit 12. Correspondingly, the logic circuit is configured to have a combination connection for combining a plurality of basic arithmetic elements of its own in a predetermined order in order to process the replaced node. As a result, the logic circuit can have a new arithmetic function executed by a plurality of basic arithmetic elements by having the combination connection without increasing the number of basic arithmetic elements. In other words, a group of data flows having a predetermined rule is registered in advance according to possible combinations of a plurality of basic arithmetic elements in the logic circuit, and the data flow graph processing unit 31 can execute data in one logic circuit. A group of flows can be searched and replaced with one node.

設定データ生成部３２は、データフローグラフ３８から設定データ４０を生成する。設定データ４０は、データフローグラフ３８をリコンフィギュラブル回路１２にマッピングするためのデータであり、リコンフィギュラブル回路１２における論理回路の機能や論理回路間の接続関係を定める。設定データ生成部３２が、１つの生成すべき回路を分割してできる複数の回路の設定データ４０を生成してもよい。 The setting data generation unit 32 generates setting data 40 from the data flow graph 38. The setting data 40 is data for mapping the data flow graph 38 to the reconfigurable circuit 12 and determines the function of the logic circuit in the reconfigurable circuit 12 and the connection relationship between the logic circuits. The setting data generation unit 32 may generate setting data 40 for a plurality of circuits obtained by dividing one circuit to be generated.

図２は、１つの生成すべきターゲット回路４２を分割してできる複数の回路の設定データ４０について説明するための図である。１つのターゲット回路４２を分割して生成される回路を、「分割回路」と呼ぶ。この例では、１つのターゲット回路４２が、４つの分割回路、すなわち分割回路Ａ、分割回路Ｂ、分割回路Ｃ、分割回路Ｄに分割されている。ターゲット回路４２は、データフローグラフ３８における演算の流れにしたがって分割される。データフローグラフ３８において、上から下に向かう方向に演算の流れが表現される場合、そのデータフローグラフ３８を上から所定の間隔で切り取り、その切り取った部分を分割回路として設定する。流れにしたがって切り取る間隔は、リコンフィギュラブル回路１２における論理回路の段数以下に定められる。ターゲット回路４２は、データフローグラフ３８の横方向で分割されてもよい。横方向に分割する幅は、リコンフィギュラブル回路１２における論理回路の１段当たりの個数以下に定められる。 FIG. 2 is a diagram for explaining setting data 40 of a plurality of circuits formed by dividing one target circuit 42 to be generated. A circuit generated by dividing one target circuit 42 is referred to as a “divided circuit”. In this example, one target circuit 42 is divided into four divided circuits, that is, divided circuit A, divided circuit B, divided circuit C, and divided circuit D. The target circuit 42 is divided according to the calculation flow in the data flow graph 38. In the data flow graph 38, when the calculation flow is expressed in a direction from the top to the bottom, the data flow graph 38 is cut from the top at a predetermined interval, and the cut portion is set as a dividing circuit. The interval to be cut according to the flow is determined to be equal to or less than the number of logic circuit stages in the reconfigurable circuit 12. The target circuit 42 may be divided in the horizontal direction of the data flow graph 38. The width to be divided in the horizontal direction is determined to be equal to or less than the number of logic circuits in the reconfigurable circuit 12 per stage.

特に、生成すべきターゲット回路４２がリコンフィギュラブル回路１２よりも大きい場合に、設定データ生成部３２は、リコンフィギュラブル回路１２にマッピングできる大きさになるように、ターゲット回路４２を分割することが好ましい。リコンフィギュラブル回路１２へのマッピングは、一度に、例えば１回のクロックで実行することができる。したがって、この場合、４つの分割回路は、４クロックで生成することができる。設定データ生成部３２は、リコンフィギュラブル回路１２における論理回路の配列構造とデータフローグラフ３８によって、ターゲット回路４２の分割方法を定める。リコンフィギュラブル回路１２の配列構造は、制御部１８から設定データ生成部３２に伝えられてもよく、また予め記憶部３４に記録されていてもよい。また、制御部１８が、ターゲット回路４２の分割方法を設定データ生成部３２に指示してもよい。 In particular, when the target circuit 42 to be generated is larger than the reconfigurable circuit 12, the setting data generation unit 32 may divide the target circuit 42 so as to have a size that can be mapped to the reconfigurable circuit 12. preferable. The mapping to the reconfigurable circuit 12 can be executed at one time, for example, with one clock. Therefore, in this case, the four divided circuits can be generated with four clocks. The setting data generation unit 32 determines the division method of the target circuit 42 based on the arrangement structure of the logic circuits in the reconfigurable circuit 12 and the data flow graph 38. The arrangement structure of the reconfigurable circuit 12 may be transmitted from the control unit 18 to the setting data generation unit 32 or may be recorded in the storage unit 34 in advance. In addition, the control unit 18 may instruct the setting data generation unit 32 on how to divide the target circuit 42.

以上の手順を実行することにより、記憶部３４は、リコンフィギュラブル回路１２を所期の回路として構成するための複数の設定データ４０を記憶する。複数の設定データ４０は、分割回路Ａを構成するための設定データ４０ａ、分割回路Ｂを構成するための設定データ４０ｂ、分割回路Ｃを構成するための設定データ４０ｃ、および分割回路Ｄを構成するための設定データ４０ｄである。既述のごとく、複数の設定データ４０は、１つのターゲット回路４２を分割した複数の分割回路をそれぞれ表現したものである。各設定データ４０を供給することにより、リコンフィギュラブル回路１２上に分割回路を１クロックで構成することができる。このように、リコンフィギュラブル回路１２の回路規模に応じて、生成すべきターゲット回路４２の設定データ４０を生成することにより、汎用性の高い処理装置１０を実現することが可能となる。別の視点からみると、実施の形態の処理装置１０によれば、回路規模の小さいリコンフィギュラブル回路１２を用いて、所望の回路を再構成することが可能となる。 By executing the above procedure, the storage unit 34 stores a plurality of setting data 40 for configuring the reconfigurable circuit 12 as a desired circuit. The plurality of setting data 40 constitute setting data 40a for configuring the dividing circuit A, setting data 40b for configuring the dividing circuit B, setting data 40c for configuring the dividing circuit C, and a dividing circuit D. This is setting data 40d. As described above, the plurality of setting data 40 represent a plurality of divided circuits obtained by dividing one target circuit 42, respectively. By supplying each setting data 40, the dividing circuit can be configured on the reconfigurable circuit 12 with one clock. As described above, by generating the setting data 40 of the target circuit 42 to be generated according to the circuit scale of the reconfigurable circuit 12, it is possible to realize the processing apparatus 10 with high versatility. From another point of view, according to the processing device 10 of the embodiment, it is possible to reconfigure a desired circuit using the reconfigurable circuit 12 having a small circuit scale.

図３は、リコンフィギュラブル回路１２の一般的な構成を示す。リコンフィギュラブル回路１２は、それぞれが複数の演算機能を選択的に実行可能な論理回路の多段配列と、前段の論理回路の出力と後段の論理回路の入力の接続関係を任意に設定可能な接続部５２とを備える。リコンフィギュラブル回路１２では、論理回路の多段配列構造により、上段から下段に向かって演算が進められる。なお、本明細書および特許請求の範囲において「多段」とは、複数の段を意味する。 FIG. 3 shows a general configuration of the reconfigurable circuit 12. The reconfigurable circuit 12 has a multi-stage arrangement of logic circuits each capable of selectively executing a plurality of arithmetic functions, and a connection that can arbitrarily set the connection relationship between the output of the preceding logic circuit and the input of the succeeding logic circuit. Part 52. In the reconfigurable circuit 12, the operation proceeds from the upper stage to the lower stage due to the multistage arrangement structure of the logic circuits. In the present specification and claims, “multi-stage” means a plurality of stages.

リコンフィギュラブル回路１２は、論理回路としてＡＬＵ(Arithmetic Logic Unit)を有している。ＡＬＵは、複数種類の多ビット演算を選択的に実行可能な算術論理回路であって、論理和、論理積、ビットシフト、加算、減算などの複数種類の多ビット演算を設定により選択的に実行できる。各ＡＬＵは、複数の演算機能を設定するためのセレクタを有して構成されている。実施の形態において、ＡＬＵは、乗算を実行する機能は有していない。乗算用の演算素子の回路規模が大きいためであり、したがって、乗算演算は、加算やビットシフト、ビット積演算などを組み合わせることで処理されることになる。図示の例では、ＡＬＵが、２つの入力端子と１つの出力端子を有して構成される。 The reconfigurable circuit 12 has an ALU (Arithmetic Logic Unit) as a logic circuit. ALU is an arithmetic logic circuit that can selectively execute multiple types of multi-bit operations, and selectively execute multiple types of multi-bit operations such as logical sum, logical product, bit shift, addition, and subtraction. it can. Each ALU has a selector for setting a plurality of arithmetic functions. In the embodiment, the ALU does not have a function of executing multiplication. This is because the circuit scale of the operation element for multiplication is large. Therefore, the multiplication operation is processed by combining addition, bit shift, bit product operation, and the like. In the illustrated example, the ALU is configured to have two input terminals and one output terminal.

リコンフィギュラブル回路１２は、縦方向にＸ個、横方向にＹ個のＡＬＵが配置されたＸ段Ｙ列のＡＬＵアレイとして構成される。ここでは、縦方向に３個、横方向に６個のＡＬＵが配置された３段６列のＡＬＵアレイを示している。リコンフィギュラブル回路１２は、接続部５２およびＡＬＵ列５３を備える。ＡＬＵ列５３は複数段に設けられ、接続部５２は前後段のＡＬＵ列５３の間に設けられて、前段のＡＬＵの出力と後段のＡＬＵの入力の接続関係を設定する。 The reconfigurable circuit 12 is configured as an ALU array of X stages and Y columns in which X ALUs in the vertical direction and Y ALUs in the horizontal direction are arranged. Here, a three-stage 6-column ALU array in which three ALUs in the vertical direction and six ALUs in the horizontal direction are arranged is shown. The reconfigurable circuit 12 includes a connection unit 52 and an ALU column 53. The ALU row 53 is provided in a plurality of stages, and the connection unit 52 is provided between the front and rear ALU rows 53 to set the connection relationship between the output of the previous ALU and the input of the rear ALU.

図３に示す例では、第１段のＡＬＵ列５３ａと第２段のＡＬＵ列５３ｂの間に、第２段を構成する接続部５２ｂが設けられ、第２段のＡＬＵ列５３ｂと第３段のＡＬＵ列５３ｃの間に、第３段を構成する接続部５２ｃが設けられる。なお、第１段を構成する接続部５２ａは、第１段のＡＬＵ列５３ａの上側に設けられる。 In the example shown in FIG. 3, a connection section 52b constituting the second stage is provided between the first-stage ALU row 53a and the second-stage ALU row 53b, and the second-stage ALU row 53b and the third-stage ALU row 53b are provided. Between the two ALU rows 53c, a connecting portion 52c constituting the third stage is provided. In addition, the connection part 52a which comprises a 1st stage is provided above the ALU row | line | column 53a of a 1st stage.

第１段のＡＬＵ１１、ＡＬＵ１２、・・・、ＡＬＵ１６には、入力変数や定数が入力され、設定された所定の演算がなされる。演算結果の出力は、第２段の接続部５２ｂに設定された接続にしたがって、第２段のＡＬＵ２１、ＡＬＵ２２、・・・、ＡＬＵ２６に入力される。第２段の接続部５２ｂにおいては、第１段のＡＬＵ列５３ａの出力と第２段のＡＬＵ列５３ｂの入力の間で任意の接続関係、あるいは予め定められた接続関係の組合せの中から選択された接続関係を実現できるように接続用結線が構成されており、設定により所期の結線が有効となる。第２段のＡＬＵ２１、ＡＬＵ２２、・・・、ＡＬＵ２６には、ＡＬＵ列５３ａの出力が入力され、設定された所定の演算がなされる。演算結果の出力は、第３段の接続部５２ｃの接続用結線において設定された接続にしたがって、第３段のＡＬＵ３１、ＡＬＵ３２、・・・、ＡＬＵ３６に入力される。 Input variables and constants are input to the first-stage ALU 11, ALU 12,..., ALU 16, and a set predetermined calculation is performed. The calculation result output is input to the second-stage ALU 21, ALU 22,..., ALU 26 according to the connection set in the second-stage connection unit 52b. In the second-stage connection unit 52b, an arbitrary connection relationship between the output of the first-stage ALU column 53a and the input of the second-stage ALU column 53b, or a combination of predetermined connection relationships is selected. The connection connection is configured so as to realize the established connection relationship, and the desired connection is made effective by setting. The second stage ALU 21, ALU 22,..., ALU 26 receives the output of the ALU column 53a and performs a predetermined calculation. The output of the calculation result is input to the third-stage ALU 31, ALU 32,..., ALU 36 according to the connection set in the connection connection of the third-stage connection section 52c.

最終段となる第３段のＡＬＵ列５３ｃからの出力データは、内部状態保持回路２０に出力される。内部状態保持回路２０および／または遅延保持回路２７は、第１フィードバック経路２４および／または第２フィードバック経路２９を介して、出力データを接続部５２ａに入力する。接続部５２ａは、接続用結線を設定し、第１段のＡＬＵ１１、ＡＬＵ１２、・・・、ＡＬＵ１６にデータを供給する。 Output data from the third-stage ALU column 53c, which is the final stage, is output to the internal state holding circuit 20. The internal state holding circuit 20 and / or the delay holding circuit 27 inputs output data to the connection unit 52 a via the first feedback path 24 and / or the second feedback path 29. The connection unit 52a sets the connection for connection and supplies data to the first-stage ALU 11, ALU 12,.

図４は、データフローグラフ３８の構造を説明するための図である。データフローグラフ３８においては、入力される変数や定数の演算の流れが段階的にグラフ構造で表現されている。図中、演算子は丸印で示されている。“＞＞”は右へのビットシフトを示し、“＜＜”は左へのビットシフトを示し、“＋”は加算を示し、“−”は減算を示す。設定データ生成部３２は、このデータフローグラフ３８をリコンフィギュラブル回路１２にマッピングするための設定データ４０を生成する。特にデータフローグラフ３８をリコンフィギュラブル回路１２にマッピングしきれない場合に、データフローグラフ３８を複数の領域に分割して、分割回路の設定データ４０を生成する。データフローグラフ３８による演算の流れを回路上で実現するべく、設定データ４０は、演算機能を割り当てる論理回路を特定し、また論理回路間の接続関係を定め、さらに入力変数や入力定数などを定義したデータとなる。したがって、設定データ４０は、各論理回路の機能を選択するセレクタに供給する選択情報、接続部５２の結線を設定する接続情報、必要な変数データや定数データなどを含んで構成される。 FIG. 4 is a diagram for explaining the structure of the data flow graph 38. In the data flow graph 38, the flow of operations of input variables and constants is expressed step by step in a graph structure. In the figure, operators are indicated by circles. “>>” indicates a bit shift to the right, “<<” indicates a bit shift to the left, “+” indicates addition, and “−” indicates subtraction. The setting data generation unit 32 generates setting data 40 for mapping the data flow graph 38 to the reconfigurable circuit 12. In particular, when the data flow graph 38 cannot be mapped to the reconfigurable circuit 12, the data flow graph 38 is divided into a plurality of regions, and setting data 40 for the divided circuit is generated. In order to realize the flow of calculation by the data flow graph 38 on the circuit, the setting data 40 specifies the logic circuit to which the calculation function is assigned, defines the connection relationship between the logic circuits, and further defines input variables, input constants, and the like. Data. Accordingly, the setting data 40 includes selection information supplied to a selector that selects the function of each logic circuit, connection information for setting the connection of the connection unit 52, necessary variable data, constant data, and the like.

なお、本実施の形態においては、複数の演算機能を選択的に実行可能な論理回路が、乗算演算を実行するための基本演算素子を有していないため、乗算演算は、コンパイル部３０により、論理回路が有する基本演算素子で処理できる加算、ビットシフトなどの演算に展開されて、設定データ４０が生成されることになる。 In this embodiment, since the logic circuit that can selectively execute a plurality of arithmetic functions does not have a basic arithmetic element for executing a multiplication operation, the multiplication operation is performed by the compiling unit 30. The setting data 40 is generated by being expanded into operations such as addition and bit shift that can be processed by the basic arithmetic elements of the logic circuit.

図１に戻って、回路の構成時、制御部１８は、１つのターゲット回路４２を構成するための複数の設定データ４０を記憶部３４から選択して読み出す。ここでは制御部１８が、図２に示すターゲット回路４２を構成するための設定データ４０、すなわち分割回路Ａの設定データ４０ａ、分割回路Ｂの設定データ４０ｂ、分割回路Ｃの設定データ４０ｃおよび分割回路Ｄの設定データ４０ｄを記憶部３４から読み出し、設定部１４に供給する。設定部１４は、各設定データ４０を格納する。 Returning to FIG. 1, at the time of circuit configuration, the control unit 18 selects and reads a plurality of setting data 40 for configuring one target circuit 42 from the storage unit 34. Here, the control unit 18 sets the setting data 40 for configuring the target circuit 42 shown in FIG. 2, that is, the setting data 40a of the dividing circuit A, the setting data 40b of the dividing circuit B, the setting data 40c of the dividing circuit C, and the dividing circuit. The D setting data 40 d is read from the storage unit 34 and supplied to the setting unit 14. The setting unit 14 stores each setting data 40.

設定部１４がコマンドメモリとして構成されている場合、制御部１８は設定部１４に対してプログラムカウンタ値を与え、設定部１４は、そのカウンタ値に応じて格納した設定データを、コマンドデータとしてリコンフィギュラブル回路１２に設定する。なお、設定部１４は、キャッシュメモリや他の種類のメモリを有して構成されてもよい。なお、本例においては、制御部１８が記憶部３４から設定データ４０を受けて、その設定データを設定部１４に供給する構成について説明するが、制御部１８を介さずに、予め設定部１４に設定データを格納しておいてもよい。この場合、制御部１８は、設定部１４に予め格納された複数の設定データの中からターゲット回路４２に応じた設定データがリコンフィギュラブル回路１２に供給されるように、設定部１４のデータ読出しを制御する。 When the setting unit 14 is configured as a command memory, the control unit 18 gives a program counter value to the setting unit 14, and the setting unit 14 reconfigures the setting data stored in accordance with the counter value as command data. Set to the configurable circuit 12. The setting unit 14 may include a cache memory and other types of memory. In this example, a configuration in which the control unit 18 receives the setting data 40 from the storage unit 34 and supplies the setting data to the setting unit 14 will be described. However, the setting unit 14 is not provided via the control unit 18 in advance. The setting data may be stored in the. In this case, the control unit 18 reads data from the setting unit 14 so that setting data corresponding to the target circuit 42 is supplied to the reconfigurable circuit 12 from among a plurality of setting data stored in advance in the setting unit 14. To control.

設定部１４は、設定データ４０をリコンフィギュラブル回路１２に設定し、リコンフィギュラブル回路１２に回路を逐次再構成させる。これにより、リコンフィギュラブル回路１２は、所期の演算を実行できる。リコンフィギュラブル回路１２は、基本セルとして高性能の演算能力のあるＡＬＵを用いており、またリコンフィギュラブル回路１２および設定部１４を１チップ上に構成することから、コンフィグレーションを高速に、例えば１クロックで実現することができる。制御部１８はクロック機能を有し、クロック信号は、出力回路２２および遅延保持回路２７に供給される。また制御部１８は４進カウンタを含み、カウント信号を設定部１４に供給してもよい。 The setting unit 14 sets the setting data 40 in the reconfigurable circuit 12 and causes the reconfigurable circuit 12 to sequentially reconfigure the circuit. As a result, the reconfigurable circuit 12 can execute a desired calculation. The reconfigurable circuit 12 uses an ALU having a high-performance computing capability as a basic cell, and the reconfigurable circuit 12 and the setting unit 14 are configured on one chip, so that the configuration can be performed at a high speed, for example, It can be realized with one clock. The control unit 18 has a clock function, and the clock signal is supplied to the output circuit 22 and the delay holding circuit 27. The control unit 18 may include a quaternary counter and supply a count signal to the setting unit 14.

＜リコンフィギュラブル回路の動作の説明＞
以下では、図５から図１０を用いて、リコンフィギュラブル回路１２による回路構成機能の基本動作の説明を行う。以下に示すリコンフィギュラブル回路１２の基本動作を前提として、かかるリコンフィギュラブル回路１２の動作設定に必要なデータフローグラフの処理方法を図１１以降の図面を用いて説明する。 <Description of operation of reconfigurable circuit>
Hereinafter, the basic operation of the circuit configuration function of the reconfigurable circuit 12 will be described with reference to FIGS. Based on the basic operation of the reconfigurable circuit 12 shown below, a data flow graph processing method necessary for setting the operation of the reconfigurable circuit 12 will be described with reference to FIG. 11 and subsequent drawings.

図５は、前後７点を利用する７タップからなるＦＩＲフィルタ回路を示す。以下、このＦＩＲ（Finite Impulse Response）フィルタ回路を、実施の形態における処理装置１０で実現する具体例を示す。このＦＩＲフィルタ回路の係数は、図５の例では、対称に設定されている。 FIG. 5 shows a 7-tap FIR filter circuit using front and rear 7 points. Hereinafter, a specific example in which the FIR (Finite Impulse Response) filter circuit is realized by the processing device 10 according to the embodiment will be described. The coefficients of the FIR filter circuit are set symmetrically in the example of FIG.

図６は、図５で示すＦＩＲフィルタ回路を置き換えた回路を示す。回路の置き換えは、フィルタ係数の対称性を利用している。 FIG. 6 shows a circuit in which the FIR filter circuit shown in FIG. 5 is replaced. The circuit replacement uses the symmetry of the filter coefficient.

図７は、図６で示すＦＩＲフィルタ回路をさらに置き換えた回路を示す。ここでは、フィルタ係数に着目した置き換えを行っている。具体的には、係数1/16を1/2×1/2×1/2×1/2に、2/16を1/2×1/2×1/2に、8/16を1/2に置き換えている。係数1/2の演算はデータを右に１ビットシフトすることで実現できる。１ビットシフタは、複数ビットシフタと比べて、ＡＬＵ内において非常に小さいスペースで形成することができる。 FIG. 7 shows a circuit in which the FIR filter circuit shown in FIG. 6 is further replaced. Here, the replacement is performed focusing on the filter coefficient. Specifically, the coefficient 1/16 is 1/2 × 1/2 × 1/2 × 1/2, 2/16 is 1/2 × 1/2 × 1/2, 8/16 is 1 / Replaced with 2. The calculation of the coefficient 1/2 can be realized by shifting the data to the right by 1 bit. The 1-bit shifter can be formed in a very small space in the ALU compared to the multiple-bit shifter.

図８は、図７に示すＦＩＲフィルタ回路をコンパイルして作成したデータフローグラフ３８ａを示す。図中、“＋”は加算を示し、“＞＞１”は１ビットのシフトを示し、“ＭＯＶ”はスルー用のパスを示す。図示のごとく、データフローグラフ３８ａは、７段の演算子で構成される。 FIG. 8 shows a data flow graph 38a created by compiling the FIR filter circuit shown in FIG. In the figure, “+” indicates addition, “>> 1” indicates 1-bit shift, and “MOV” indicates a through path. As shown, the data flow graph 38a is composed of seven stages of operators.

図９は、リコンフィギュラブル回路１２の一例を示す。このリコンフィギュラブル回路１２は、２段４列のＡＬＵを含んで構成される。 FIG. 9 shows an example of the reconfigurable circuit 12. The reconfigurable circuit 12 includes an ALU having two stages and four columns.

図１０は、図８に示すデータフローグラフ３８ａを、図９のリコンフィギュラブル回路１２を用いて実現する例を示す。データフローグラフ３８ａが７段４列で構成され、リコンフィギュラブル回路１２が２段で構成されていることから、データフローグラフ３８ａは、上下方向に４つに分割される。なお、左右方向については、リコンフィギュラブル回路１２の列数が、データフローグラフ３８ａの列数以下であるため、分割する必要はない。なお、ここではリコンフィギュラブル回路１２の列数とデータフローグラフ３８ａの列数とが等しい場合が示されている。分割したデータフローグラフは、リコンフィギュラブル回路１２上に１クロックで構成されることが可能である。 FIG. 10 shows an example in which the data flow graph 38a shown in FIG. 8 is realized by using the reconfigurable circuit 12 of FIG. Since the data flow graph 38a is composed of seven stages and four columns and the reconfigurable circuit 12 is composed of two stages, the data flow graph 38a is divided into four in the vertical direction. In the left-right direction, since the number of columns of the reconfigurable circuit 12 is equal to or less than the number of columns of the data flow graph 38a, there is no need for division. Here, a case where the number of columns of the reconfigurable circuit 12 and the number of columns of the data flow graph 38a are equal is shown. The divided data flow graph can be configured with one clock on the reconfigurable circuit 12.

まず、設定部１４が、データフローグラフ３８ａの第１段および第２段の内容を、設定データ４０ａによりリコンフィギュラブル回路１２上に構成する。これにより、第１分割回路がリコンフィギュラブル回路１２に構成される。続いて、設定部１４が、データフローグラフ３８ａの第３段および第４段の内容を、設定データ４０ｂによりリコンフィギュラブル回路１２上に構成する。これにより、第２分割回路がリコンフィギュラブル回路１２に構成される。続いて、設定部１４が、データフローグラフ３８ａの第５段および第６段の内容を、設定データ４０ｃによりリコンフィギュラブル回路１２上に構成する。これにより、第３分割回路がリコンフィギュラブル回路１２に構成される。最後に、設定部１４が、データフローグラフ３８ａの第７段および第８段（ＭＯＶ）の内容を、設定データ４０ｄによりリコンフィギュラブル回路１２上に構成する。これにより、第４分割回路がリコンフィギュラブル回路１２に構成される。第１分割回路から第３分割回路における出力結果は、次の分割回路の入力としてフィードバックされる。 First, the setting unit 14 configures the contents of the first stage and the second stage of the data flow graph 38a on the reconfigurable circuit 12 with the setting data 40a. As a result, the first divided circuit is configured in the reconfigurable circuit 12. Subsequently, the setting unit 14 configures the contents of the third stage and the fourth stage of the data flow graph 38a on the reconfigurable circuit 12 with the setting data 40b. As a result, the second divided circuit is configured in the reconfigurable circuit 12. Subsequently, the setting unit 14 configures the contents of the fifth stage and the sixth stage of the data flow graph 38a on the reconfigurable circuit 12 with the setting data 40c. As a result, the third divided circuit is configured in the reconfigurable circuit 12. Finally, the setting unit 14 configures the contents of the seventh stage and the eighth stage (MOV) of the data flow graph 38a on the reconfigurable circuit 12 with the setting data 40d. Thereby, the fourth division circuit is configured in the reconfigurable circuit 12. An output result from the first divided circuit to the third divided circuit is fed back as an input of the next divided circuit.

この例において、ＡＬＵは、“＋”、“＞＞１”、“ＭＯＶ”の３種類のみで実現することができる。複数ビットのシフトを、１ビットシフタを複数回利用することにより表現することとしたため、必要とされるＡＬＵの機能を非常に少なくすることができる。これにより、リコンフィギュラブル回路１２の回路規模を小さくできる。なお、当然のことながら、図７に示すデータフローグラフをリコンフィギュラブル回路１２上に構成することも可能である。以上が、リコンフィギュラブル回路１２の基本動作である。 In this example, the ALU can be realized with only three types of “+”, “>> 1”, and “MOV”. Since the multi-bit shift is expressed by using the 1-bit shifter a plurality of times, the required ALU functions can be greatly reduced. Thereby, the circuit scale of the reconfigurable circuit 12 can be reduced. As a matter of course, the data flow graph shown in FIG. 7 can be configured on the reconfigurable circuit 12. The above is the basic operation of the reconfigurable circuit 12.

＜データフローグラフの処理機能の説明＞
以下では、乗算演算を実行する場合の実施の形態にかかる処理装置１０の処理について説明する。既述したように、本実施の形態の処理装置１０では、ＡＬＵに乗算用の基本演算素子を持たせるのではなく、乗算演算を、もともとＡＬＵに基本演算素子として組み込まれている加算演算やビットシフト演算などのノードに展開することで、乗算演算用のデータフローグラフ（ＤＦＧ）を実行する。本実施の形態では、複数のノードからなるデータフローの一群を１つのノードに置換することで、ＤＦＧの大きさを小さくし、リコンフィギュラブル回路１２における演算処理を効率よく実行することが可能となり、回路規模の削減および低消費電力化を実現することが可能となる。 <Description of data flow graph processing function>
Below, the process of the processing apparatus 10 concerning embodiment at the time of performing a multiplication operation is demonstrated. As described above, in the processing apparatus 10 of the present embodiment, the ALU does not have a basic arithmetic element for multiplication, but the multiplication operation is performed as an addition operation or bit originally incorporated in the ALU as the basic arithmetic element. A data flow graph (DFG) for multiplication operation is executed by expanding to nodes such as shift operation. In this embodiment, by replacing a group of data flows composed of a plurality of nodes with one node, the size of the DFG can be reduced, and the arithmetic processing in the reconfigurable circuit 12 can be executed efficiently. Therefore, it is possible to reduce the circuit scale and reduce the power consumption.

乗算のアルゴリズムとして代表的なものに、筆算アルゴリズムやＢｏｏｔｈアルゴリズムがある。以下では、乗算用のプログラム３６が、筆算アルゴリズムで作成されている場合を例にとる。 Typical examples of multiplication algorithms include a writing algorithm and a Booth algorithm. In the following, a case where the multiplication program 36 is created by a writing algorithm will be taken as an example.

図１１は、ｍビットの被乗数ｘとｎビットの乗数ｙの筆算アルゴリズムのフローチャートを示す。ここでは、乗数ｙが正の場合を想定する。この演算処理を実行するために、ｍビットの被乗数レジスタ、ｎビットの乗数レジスタ、ｍ＋ｎビットの積レジスタが存在するものとする。演算前に乗数ｙの符号を判定し、正ならば積レジスタの右ｎビット分に乗数ｙを設定し、左ｍビット分に０を初期値として設定する。乗数ｙが負ならば絶対値を乗数ｙとする。 FIG. 11 shows a flowchart of a writing algorithm of an m-bit multiplicand x and an n-bit multiplier y. Here, it is assumed that the multiplier y is positive. In order to execute this arithmetic processing, an m-bit multiplicand register, an n-bit multiplier register, and an m + n-bit product register are present. The sign of the multiplier y is determined before the operation. If it is positive, the multiplier y is set for the right n bits of the product register, and 0 is set as the initial value for the left m bits. If the multiplier y is negative, the absolute value is taken as the multiplier y.

繰返し回数ｒを初期値０に設定して（Ｓ１０）、積レジスタのＬＳＢ（Least Significant Bit）が１であるかどうかを判定する（Ｓ１１）。積レジスタのＬＳＢが１の場合（Ｓ１１のＹ）、被乗数ｘを積レジスタの左ｍビット分に加算して、その加算結果を積レジスタの左ｍビット分に格納し（Ｓ１２）、積レジスタを１ビット右にシフトする（Ｓ１３）。これにより、乗数ｙのＬＳＢについての乗算処理を実行する。なお、積レジスタのＬＳＢが０の場合（Ｓ１１のＮ）、同様に積レジスタを１ビット右にシフトする（Ｓ１３）。ＬＳＢが０の場合は、乗数ｙのＬＳＢと被乗数との乗算結果が０となるため、単に積レジスタを１ビット右シフトするだけでよい。このとき、積レジスタのＭＳＢ（Most Significant Bit）には、１ビット右シフトする前のＭＳＢと同じ数値を入れる。繰返し回数ｒを１インクリメントし（Ｓ１４）、ｒがｎに等しくなったかどうかを判定する（Ｓ１５）。ｒがｎに等しくなるとは、乗数ｙの全ビットについての筆算を完了した場合である。ｒがｎ未満である場合（Ｓ１５のＮ）、Ｓ１１からＳ１４のステップを繰り返し、積レジスタのＬＳＢの乗算処理を逐次実行していく。右に１ビットシフトを行うことは（Ｓ１３）、乗数ｙのビットをＬＳＢから順次ずらしながら、下位のビットから逐次乗算を行っていくことに相当する。ｒがｎに等しくなった場合（Ｓ１５のＹ）、積レジスタに格納された値が乗算結果となり、本フローを終了する。なお、乗数ｙが負の場合は、このフローを（ｎ−１）回、つまり（乗数ｙのビット数−１）回繰り返し、最後に被乗数ｘの符号反転した値を積レジスタの左ｍビット分に加算する。この時オーバーフロー分は切り捨てて、積レジスタの左ｍビット分に格納し、次に積レジスタを１ビット右シフトする。 The number of repetitions r is set to an initial value 0 (S10), and it is determined whether or not the LSB (Least Significant Bit) of the product register is 1 (S11). When the LSB of the product register is 1 (Y in S11), the multiplicand x is added to the left m bits of the product register, and the addition result is stored in the left m bits of the product register (S12). Shift right one bit (S13). As a result, the multiplication process for the LSB of the multiplier y is executed. If the LSB of the product register is 0 (N in S11), the product register is similarly shifted to the right by 1 bit (S13). When the LSB is 0, the multiplication result of the LSB of the multiplier y and the multiplicand is 0, so the product register need only be shifted right by 1 bit. At this time, the MSB (Most Significant Bit) of the product register is set to the same numerical value as the MSB before the right shift by 1 bit. The repeat count r is incremented by 1 (S14), and it is determined whether r is equal to n (S15). The case where r is equal to n is a case where writing for all the bits of the multiplier y is completed. When r is less than n (N in S15), the steps from S11 to S14 are repeated, and the LSB multiplication process of the product register is sequentially executed. Performing a 1-bit shift to the right (S13) is equivalent to sequentially performing multiplication from the lower bits while sequentially shifting the bits of the multiplier y from the LSB. When r becomes equal to n (Y in S15), the value stored in the product register becomes the multiplication result, and this flow ends. If the multiplier y is negative, this flow is repeated (n-1) times, that is, (the number of bits of the multiplier y-1) times, and finally the sign-inverted value of the multiplicand x is obtained for the left m bits of the product register. Add to. At this time, the overflow is rounded down and stored in the left m bits of the product register, and then the product register is shifted right by 1 bit.

図１２は、２進数表示した４ビット×４ビットの具体的な処理動作の一例を示す。この処理動作は、図１１に示すフローチャートに基づいている。図１２では、乗数ｙが正である場合を示し、被乗数ｘを１０１１（１０進表示では−５）、乗数ｙを０１１１（１０進表示では７）とする。初期設定として、積レジスタａｎｓに０００００１１１を設定すると、図１２に示すように筆算アルゴリズムにより乗算結果を得ることができる。 FIG. 12 shows an example of a specific processing operation of 4 bits × 4 bits expressed in binary. This processing operation is based on the flowchart shown in FIG. FIG. 12 shows a case where the multiplier y is positive. The multiplicand x is 1011 (−5 in decimal display) and the multiplier y is 0111 (7 in decimal display). When 00000111 is set in the product register ans as an initial setting, a multiplication result can be obtained by a writing algorithm as shown in FIG.

図１３は、２進数表示した４ビット×４ビットの具体的な処理動作の別の例を示す。この処理動作は、図１１に示すフローチャートに基づいている。図１３では、乗数ｙが負である場合を示し、被乗数ｘを１０１１（１０進表示では−５）、乗数ｙを１１１１（１０進表示では−１）とする。初期設定として、積レジスタａｎｓに００００１１１１を設定すると、図１２に示すように筆算アルゴリズムにより乗算結果を得ることができる。なお、乗数ｙが負であるため、Ｓ１１〜Ｓ１３の処理を３回繰り返した後、所定の負符号処理を実行することで、乗算演算が終了する。 FIG. 13 shows another example of a specific processing operation of 4 bits × 4 bits expressed in binary. This processing operation is based on the flowchart shown in FIG. FIG. 13 shows a case where the multiplier y is negative, and the multiplicand x is 1011 (−5 in decimal display) and the multiplier y is 1111 (−1 in decimal display). As an initial setting, when 00001111 is set in the product register ans, a multiplication result can be obtained by a writing algorithm as shown in FIG. Since the multiplier y is negative, the multiplication operation is completed by executing the predetermined negative sign process after repeating the processes of S11 to S13 three times.

図１４は、筆算アルゴリズムを用いて被乗数２４ビット×乗数４ビットの計算をするＤＦＧを示す。このＤＦＧは、ＡＬＵに含まれる基本演算素子（基本演算ノード）で実行できるようにコンパイル部３０により展開されたものである。図１４の中の最上段における「１：_Ghissann_ver3_9_1_0_0.y」は乗数４ビットの入力（ｙ系統）を示し、「2：_Ghissann_ver3_9_1_0_0.x」は被乗数２４ビットの入力（ｘ系統）を示す。また最終段における「_Ghissann_ver3_9_1_0_0.ans」は乗算結果を示す。図１４の中の「ａｄｄ」ノードは加算演算、「bgt」ノードは大小比較演算（＜）で、１段目の「1:bgt:20」は「y＜0」を表している。「neg」ノードは符号反転演算、「lsl」は左シフト演算で、１段目の「6:lsl:20」は４ビット左シフト「＜＜4」を表している。「mbgt」ノードは「bgt」ノードの条件により２入力のどちらか一方を選択する選択演算で、破線が条件データを表している。「asr」ノードは右シフト演算、「bne」ノードは等価比較演算（!=）、「mbne」ノードは「bne」ノードの条件により２入力のどちらか一方を選択する選択演算（マージ演算）で、破線が条件データを表している。「nop」ノードは演算を行わない、データスルーのためのノードである。図１４より２４ビット×４ビットの計算には、ノードが２０段必要であり、また、基本演算ノードの数は６５個必要なことがわかる。 FIG. 14 shows a DFG that calculates a multiplicand of 24 bits × multiplier of 4 bits using a writing algorithm. This DFG is developed by the compiling unit 30 so that it can be executed by basic arithmetic elements (basic arithmetic nodes) included in the ALU. In FIG. 14, “1: _Ghissann_ver3_9_1_0_0.y” in the uppermost row indicates a 4-bit multiplier input (y system), and “2: _Ghissann_ver3_9_1_0_0.x” indicates a 24-bit multiplicand input (x system). Further, “_Ghissann_ver3_9_1_0_0.ans” in the last stage indicates a multiplication result. The “add” node in FIG. 14 is an addition operation, the “bgt” node is a magnitude comparison operation (<), and “1: bgt: 20” in the first row represents “y <0”. The “neg” node represents a sign inversion operation, “lsl” represents a left shift operation, and “6: lsl: 20” in the first stage represents a 4-bit left shift “<< 4”. The “mbgt” node is a selection operation for selecting one of two inputs according to the condition of the “bgt” node, and the broken line represents the condition data. The “asr” node is a right shift operation, the “bne” node is an equivalent comparison operation (! =), And the “mbne” node is a selection operation (merge operation) that selects one of the two inputs according to the condition of the “bne” node. The broken line represents the condition data. The “nop” node is a data-through node that does not perform an operation. From FIG. 14, it is understood that 20 stages of nodes are necessary for calculation of 24 bits × 4 bits, and 65 basic operation nodes are necessary.

図１５は、図１４に示すデータフローグラフにおいて、所定の規則を有するデータフローの一群を点線で囲った図を示す。このＤＦＧにおいては、所定の規則にしたがって配列されたノード群７０が４つ存在している。このデータフローの一群、すなわちノードの一群は、使用する複数の演算機能が同一であり、且つ、演算機能の実行順序も等しいという性質を有している。データフローグラフ処理部３１（図１参照）が、図１５に示すＤＦＧから、所定の規則を有するノード群を探索する。 FIG. 15 shows a diagram in which a group of data flows having a predetermined rule is surrounded by a dotted line in the data flow graph shown in FIG. In this DFG, there are four node groups 70 arranged according to a predetermined rule. This group of data flows, that is, a group of nodes, has the property that a plurality of calculation functions to be used are the same and the execution order of the calculation functions is also the same. The data flow graph processing unit 31 (see FIG. 1) searches for a node group having a predetermined rule from the DFG shown in FIG.

図１６は、図１５に示すノード群７０のＤＦＧを示す。既述したように、このノード群７０は、乗算演算の過程において用いられる。図１６に示すＤＦＧは、右１ビットシフト演算、加算演算、ｎｏｐ（信号スルー）、１とのビット積演算、１との等価比較演算とマージ演算とから構成されている。本実施の形態では、このノード群７０を、１つの新たなノード「mul_t」として置き換える。したがって、ノードmul_tは、複数の基本演算で構成されたノードとなる。図１５に示すように、被乗数２４ビット×乗数４ビットの乗算のＤＦＧには、ノードmul_tの構造を有するＤＦＧが４つ存在する。 FIG. 16 shows the DFG of the node group 70 shown in FIG. As described above, this node group 70 is used in the process of multiplication. The DFG shown in FIG. 16 includes a right 1-bit shift operation, an addition operation, a nop (signal through), a bit product operation with 1, an equivalent comparison operation with 1, and a merge operation. In the present embodiment, this node group 70 is replaced with one new node “mul_t”. Therefore, the node mul_t is a node composed of a plurality of basic operations. As shown in FIG. 15, there are four DFGs having a node mul_t structure in a DFG of multiplication of multiplicand 24 bits × multiplier 4 bits.

なお、ノードmul_tは、１つのＡＬＵにおいて演算処理することが可能である。そのために、ＡＬＵでは、右ビットシフト演算素子、加算演算素子、ビット積演算素子、等価比較演算素子とマージ演算素子の複数の基本演算素子を、ノード群７０の配置関係を実現するように接続する組合せ用結線が設けられる。この組合せ用結線は、既存の基本演算素子を接続するだけであるため、回路規模はそれほど大きくならない。ＡＬＵが、例えば加算演算のみを実行する場合には、ＡＬＵ内のセレクタにより加算演算素子が選択されるが、ノードmul_tの演算を実行する場合には、ＡＬＵ内の組合せ用結線で必要な複数の基本演算素子をリンクすることで、ノード群７０の演算処理を実行することになる。 Note that the node mul_t can perform arithmetic processing in one ALU. For this purpose, in the ALU, a plurality of basic arithmetic elements such as a right bit shift arithmetic element, an addition arithmetic element, a bit product arithmetic element, an equivalent comparison arithmetic element, and a merge arithmetic element are connected so as to realize the arrangement relationship of the node group 70. Combination wiring is provided. Since this combination connection only connects the existing basic arithmetic elements, the circuit scale is not so large. For example, when the ALU performs only the addition operation, the addition operation element is selected by the selector in the ALU. However, when the operation of the node mul_t is performed, a plurality of combinations necessary for the combination connection in the ALU are selected. By linking the basic arithmetic elements, the arithmetic processing of the node group 70 is executed.

図１７は、データフローグラフ処理部３１の構成を示す。データフローグラフ処理部３１は、ノード群探索部６０、ノード置換部６１およびＤＦＧ再構成部６２を備える。実施の形態におけるデータフローグラフ処理機能は、処理装置１０において、ＣＰＵ、メモリ、メモリにロードされたＤＦＧ処理用プログラムなどによって実現され、ここではそれらの連携によって実現される機能ブロックを描いている。ＤＦＧ処理用プログラムは、処理装置１０に内蔵されていてもよく、また記録媒体に格納された形態で外部から供給されるものであってもよい。したがってこれらの機能ブロックがハードウエアのみ、ソフトウエアのみ、またはそれらの組合せによっていろいろな形で実現できることは、当業者に理解されるところである。 FIG. 17 shows the configuration of the data flow graph processing unit 31. The data flow graph processing unit 31 includes a node group search unit 60, a node replacement unit 61, and a DFG reconstruction unit 62. The data flow graph processing function in the embodiment is realized by the CPU 10, the memory, the DFG processing program loaded in the memory, and the like in the processing device 10, and here, functional blocks realized by their cooperation are depicted. The DFG processing program may be built in the processing apparatus 10 or supplied from the outside in a form stored in a recording medium. Accordingly, those skilled in the art will understand that these functional blocks can be realized in various forms by hardware only, software only, or a combination thereof.

ノード群探索部６０は、コンパイル部３０により生成されたＤＦＧから、所定の規則を有するデータフローの一群、すなわち所定の規則を有するノード群７０を探索する。図１５の例では、点線で囲ったノード群に所定の規則性が認められるため、ノード群探索部６０は、４つのノード群７０をＤＦＧから見つけることになる。 The node group search unit 60 searches the DFG generated by the compiling unit 30 for a group of data flows having a predetermined rule, that is, a node group 70 having a predetermined rule. In the example of FIG. 15, since predetermined regularity is recognized in the node group surrounded by the dotted line, the node group search unit 60 finds four node groups 70 from the DFG.

ノード置換部６１は、探索したノード群７０を所定数のノードに置換する。置換するノードの数および段数は、ノード群７０に含まれるノード数よりも少なく、且つ、ノード群７０の段数よりも少なくすることが好ましい。上記したように、本実施の形態では、ノード置換部６１が、ノード群７０を、１つのノードmul_tに置換する。 The node replacement unit 61 replaces the searched node group 70 with a predetermined number of nodes. The number of nodes and the number of stages to be replaced are preferably smaller than the number of nodes included in the node group 70 and smaller than the number of stages of the node group 70. As described above, in this embodiment, the node replacement unit 61 replaces the node group 70 with one node mul_t.

図１８は、ノードの置換を行った後のＤＦＧを示す。４段にて構成されていたノード群７０（図１６参照）が、１つのノードmul_tに置き換えられている状態が示される。 FIG. 18 shows the DFG after node replacement. A state in which the node group 70 (see FIG. 16) configured in four stages is replaced with one node mul_t is shown.

ＤＦＧ再構成部６２は、置換されたノードmul_tを用いて、ＤＦＧを再構成する。ＤＦＧ再構成部６２は、ノード群７０に含まれるノードと同一段に位置するノードに演算が割り当てられていない場合、すなわちｎｏｐノードが割り当てられている場合に、同一段に位置するｎｏｐノードを削除する。これにより、ＤＦＧを小さく再構成することができる。さらに、ＤＦＧ再構成部６２は、ｘ入力系統の符号反転ノードの入れ替えを行い、２系統あるｘ入力系統を１系統に統合する。これにより、ＤＦＧをさらに小さく再構成することができる。 The DFG reconfiguration unit 62 reconfigures the DFG using the replaced node mul_t. The DFG reconstruction unit 62 deletes a nop node located in the same stage when an operation is not assigned to a node located in the same stage as the nodes included in the node group 70, that is, when a nop node is assigned. To do. Thereby, DFG can be reconfigure | reconstructed small. Further, the DFG reconstruction unit 62 replaces the sign inversion nodes of the x input system, and integrates the two x input systems into one system. Thereby, DFG can be reconfigure | reconstructed still smaller.

図１９は、ノード群７０に含まれるノードと同一段に位置するｎｏｐノードを削除したＤＦＧを示す。４段分のノード群７０を１つのノードmul_tに置換したため、被乗数であるｘ入力系統のｎｏｐノードが１つのノードmul_tにつき３段分余ることになる。図１８と比較すると、１０段分のｎｏｐノードを削除することができ、ＤＦＧを小さくすることができる。 FIG. 19 shows a DFG from which the nop node located at the same stage as the nodes included in the node group 70 is deleted. Since the node group 70 for four stages is replaced with one node mul_t, the number of nop nodes of the x input system, which is a multiplicand, is three stages per node mul_t. Compared with FIG. 18, ten stages of nop nodes can be deleted, and the DFG can be reduced.

図２０は、ｘ入力系統の２系統のうちの１つに存在する符号反転ノード（−１×）を、同じ系統の最後に配置されているｎｏｐノードと入れ替えた状態を示す。 FIG. 20 shows a state in which the sign inversion node (−1 ×) existing in one of the two systems of the x input system is replaced with the nop node arranged at the end of the same system.

ｘ入力系統の右側の系統の２段目にある４ビット左シフト演算を同じ系統の１段目のｎｏｐノードと入れ替えるか、または左側の系統の１段目にある４ビット左シフト演算を同じ系統の２段目のｎｏｐノードと入れ替えると、２つの系統において、符号反転ノード（−１×）より上のノードのシーケンスが全く同じになることがわかる。 Replace the 4-bit left shift operation at the second stage of the right system of the x input system with the first stage nop node of the same system, or the 4-bit left shift operation at the first stage of the left system of the same system It can be seen that in the two systems, the sequence of the nodes above the sign inversion node (−1 ×) is exactly the same in the two systems.

図２１は、右側の系統の２段目にある４ビット左シフト演算を同じ系統の１段目のｎｏｐノードと入れ替えた後のＤＦＧを示す。図２１より、２系統あるｘ入力系統を１系統に統合可能であることがわかる。 FIG. 21 shows the DFG after the 4-bit left shift operation in the second stage of the right system is replaced with the first stage nop node of the same system. FIG. 21 shows that two x input systems can be integrated into one system.

図２２は、ｘ入力系統を１系統に統合した後のＤＦＧを示す。図２２より、ノードmul_tを使用すると被乗数２４ビット×乗数４ビットの乗算を、８段のＡＬＵ列で実行できる。図１４のＤＦＧでは２０段必要であったことと比較すると、段数は、０．４（＝８／２０）倍で済むことになる。また、図２２より、ノードの数は１５個となり、図１４のＤＦＧでは６５個必要であったことと比較すると、０．２３（＝１５／６５）倍のノード数で済む。これにより、ＤＦＧの段数を減らすことができ、リコンフィギュラブル回路１２における回路再構成回数を少なくできるとともに、回路再構成にともなう消費電力を低減することが可能となる。 FIG. 22 shows the DFG after the x input system is integrated into one system. As shown in FIG. 22, when the node mul_t is used, multiplication of multiplicand 24 bits × multiplier 4 bits can be executed with an 8-stage ALU sequence. Compared with the fact that 20 stages are required in the DFG of FIG. 14, the number of stages is 0.4 (= 8/20) times. Further, as shown in FIG. 22, the number of nodes is 15, which is 0.23 (= 15/65) times the number of nodes compared with 65 required in the DFG in FIG. As a result, the number of DFG stages can be reduced, the number of circuit reconfigurations in the reconfigurable circuit 12 can be reduced, and the power consumption associated with circuit reconfiguration can be reduced.

図２３は、図２２のｙ系統の上２段分の条件処理部分をビット積演算ノードに置き換えた状態を示す。ビット積演算は、乗数ｙに対して、ビットが全て１の値を用いる。例えば、ｙ＝３の場合、図２２の条件処理部分ではｙが正数なのでＦ（false）となり「３＝0011」がマージノードから出力される。図２３のビット積演算では「（３＝0011）＆（1111）＝0011」となり、結果として図２２の条件処理部分からの出力と同じとなる。また、ｙ＝−３の場合、図１３の条件処理部分ではｙが負数なのでＴ（true）となり「１６＋（−３）＝１３＝1101」がマージノードから出力される。図２３のビット積演算では「（−３＝1101）＆（1111）＝1101」となり、結果として図２２の条件処理部分からの出力と同じとなる。ビット積演算ノードを用いることで、図２２のｙ系統の上２段分の条件処理をしているＤＦＧに比べてＤＦＧの段数を１段減らすことができる。 FIG. 23 shows a state where the condition processing part for the upper two stages of the y system in FIG. 22 is replaced with a bit product operation node. The bit product operation uses a value in which all bits are 1 with respect to the multiplier y. For example, in the case of y = 3, since y is a positive number in the condition processing part of FIG. 22, it becomes F (false) and “3 = 0011” is output from the merge node. In the bit product operation of FIG. 23, “(3 = 0011) & (1111) = 0011” is obtained, which is the same as the output from the condition processing portion of FIG. In the case of y = -3, since y is a negative number in the condition processing part of FIG. 13, T (true) is obtained and “16 + (− 3) = 13 = 1101” is output from the merge node. In the bit product operation of FIG. 23, “(−3 = 11101) & (1111) = 11101” is obtained, which is the same as the output from the condition processing portion of FIG. By using the bit product operation node, the number of stages of the DFG can be reduced by one compared to the DFG that performs the condition processing for the upper two stages of the y system in FIG.

図２４は、被乗数８ビット×乗数８ビットの乗算のＤＦＧを示す。この場合は、８つのmul_tが必要となる。なお、図２３に示したＤＦＧと同様に、ｙ系統の上２段分の条件処理部分を、ビット積演算ノードに置換することは可能である。 FIG. 24 shows a DFG of multiplication of multiplicand 8 bits × multiplier 8 bits. In this case, 8 mul_t are required. As in the DFG shown in FIG. 23, it is possible to replace the condition processing part for the upper two stages of the y system with a bit product operation node.

図２５は、被乗数４ビット×乗数２４ビットの乗算のＤＦＧを示す。この場合は、２４個のmul_tが必要となり、また、１段目の加算で２の２４乗を加えることになる。以上のＤＦＧから、ノードmul_tは乗数の数だけ必要であることが分かる。被乗数と乗数とを入れ替えることによって、被乗数２４ビット×乗数４ビットと、被乗数４ビット×乗数２４ビットの演算結果は同じになるが、乗数のビット数が小さい方が効率的にＤＦＧを作成することができる。このように、乗数のビット数が被乗数のビット数よりも大きい場合には、乗数と被乗数とを入れ替えて、ＤＦＧを作成することが好ましい。 FIG. 25 shows a DFG of multiplication of multiplicand 4 bits × multiplier 24 bits. In this case, 24 mul_t are required, and the 24th power of 2 is added by the first stage addition. From the above DFG, it can be seen that the node mul_t is required by the number of multipliers. By exchanging the multiplicand and the multiplier, the operation result of the multiplicand 24 bits × multiplier 4 bits and the multiplicand 4 bits × multiplier 24 bits becomes the same, but the DFG is efficiently created when the number of bits of the multiplier is small. Can do. Thus, when the number of bits of the multiplier is larger than the number of bits of the multiplicand, it is preferable to create a DFG by exchanging the multiplier and the multiplicand.

図２６は、筆算アルゴリズムの別の例を示す。図２６では、１例として４ビット（２進表示で１０１１）×４ビット（２進表示で１１１１）の乗算を示す。十進表示では１１×１５である。この例では、乗算演算を所定の桁で分割して、分割した桁ごとに並列処理を行うこととを目的とする。乗算演算を並列で処理することにより、ＤＦＧの段数を少なくすることができ、効率的な演算処理を実現することが可能となる。以下では、乗数を２進表示した場合の乗算演算を奇数桁と偶数桁に分割する。乗数ｙの奇数ビットの乗算と偶数ビットの乗算を並列に行い、最後にそれらの乗算結果を加算しても演算結果は同じとなることを利用している。 FIG. 26 shows another example of the writing algorithm. FIG. 26 shows an example of multiplication of 4 bits (1011 in binary display) × 4 bits (1111 in binary display). The decimal display is 11 × 15. In this example, an object is to divide a multiplication operation by a predetermined digit and perform parallel processing for each divided digit. By performing multiplication operations in parallel, the number of DFG stages can be reduced, and efficient calculation processing can be realized. In the following, the multiplication operation when the multiplier is displayed in binary is divided into odd and even digits. Even if the multiplication of odd bits and the multiplication of even bits of the multiplier y is performed in parallel, and finally the multiplication results are added, the operation result is the same.

図２７は、偶数桁と奇数桁に分割した場合のノード群７０のＤＦＧを示す。図２７に示すＤＦＧは、右２ビットシフト演算、加算演算、ｎｏｐ（信号スルー）、１とのビット積演算、１との等価比較演算とマージ演算とから構成されている。以下では、このノード群７０を、１つの新たなノード「mul_t2」として置き換える。したがって、ノードmul_t2は、上記した複数の基本演算で構成されたノードとなる。 FIG. 27 shows the DFG of the node group 70 when it is divided into even digits and odd digits. The DFG shown in FIG. 27 includes a right 2-bit shift operation, an addition operation, a nop (signal through), a bit product operation with 1, an equivalent comparison operation with 1, and a merge operation. Hereinafter, this node group 70 is replaced with one new node “mul_t2”. Therefore, the node mul_t2 is a node configured by the plurality of basic operations described above.

ノードmul_t2は、図１６に示すノードmul_tと同様に、１つのＡＬＵにおいて演算処理することが可能である。そのために、ＡＬＵでは、右２ビットシフト演算素子、加算演算素子、ビット積演算素子、等価比較演算素子とマージ演算素子の複数の基本演算素子を、ノードmul_t2の配置関係を実現するように接続する組合せ用結線が設けられる。この組合せ用結線は、既存の基本演算素子を接続するだけであるため、回路規模はそれほど大きくならない。ノードmul_t2の演算を実行する場合には、ＡＬＵ内の組合せ用結線で必要な複数の基本演算素子をリンクすることで、ノード群７０の演算処理を実行することになる。 Similarly to the node mul_t illustrated in FIG. 16, the node mul_t2 can perform arithmetic processing in one ALU. Therefore, in the ALU, a plurality of basic arithmetic elements such as a right 2-bit shift arithmetic element, an addition arithmetic element, a bit product arithmetic element, an equivalent comparison arithmetic element, and a merge arithmetic element are connected so as to realize the arrangement relationship of the nodes mul_t2. Combination wiring is provided. Since this combination connection only connects the existing basic arithmetic elements, the circuit scale is not so large. When the operation of the node mul_t2 is executed, the operation processing of the node group 70 is executed by linking a plurality of basic arithmetic elements necessary for combination connection in the ALU.

図２８は、乗算演算をmul_tで実現したＤＦＧを示す。被乗数ｘを１０ビット、乗数ｙを１０ビットとする。ＤＦＧをmul_tで実現した場合、ノード数が３７個、段数は１９段となる。 FIG. 28 shows a DFG in which multiplication operation is realized by mul_t. The multiplicand x is 10 bits and the multiplier y is 10 bits. When DFG is realized by mul_t, the number of nodes is 37 and the number of stages is 19.

図２９は、乗算演算をmul_t2で実現したＤＦＧを示す。図２８との比較のため、被乗数ｘを１０ビット、乗数ｙを１０ビットとする。ＤＦＧをmul_t2で実現した場合、ノード数が３４個、段数は１３段となる。したがって、１０ビット×１０ビットの乗算演算の場合は、mul_t2で乗算演算を実現することで、ＤＦＧの規模を小さくすることが可能となる。 FIG. 29 shows a DFG in which multiplication operation is realized by mul_t2. For comparison with FIG. 28, the multiplicand x is 10 bits and the multiplier y is 10 bits. When DFG is realized by mul_t2, the number of nodes is 34 and the number of stages is 13. Therefore, in the case of a 10-bit × 10-bit multiplication operation, the scale of the DFG can be reduced by realizing the multiplication operation with mul_t2.

以上、本発明を実施の形態をもとに説明した。実施の形態は例示であり、それらの各構成要素や各処理プロセスの組み合わせにいろいろな変形例が可能なこと、またそうした変形例も本発明の範囲にあることは当業者に理解されるところである。 The present invention has been described based on the embodiments. The embodiments are exemplifications, and it will be understood by those skilled in the art that various modifications can be made to combinations of the respective constituent elements and processing processes, and such modifications are within the scope of the present invention. .

例えば、実施の形態では乗算演算を処理する例について説明したが、それ以外に、よく用いられる除算演算についても、同様に処理することが可能である。このように、ＤＦＧ上によく出現する演算であって、且つその演算を単独で実行する演算素子の回路規模が大きいものについては、所定の規則をもつノード群を別のノードに置換することで、ＤＦＧを小さくし、処理装置１０の処理高速性を実現することが可能である。 For example, in the embodiment, an example in which a multiplication operation is processed has been described, but other than that, a commonly used division operation can be processed in the same manner. In this way, for operations that frequently appear on the DFG and that have a large circuit scale of arithmetic elements that execute the operations alone, a node group having a predetermined rule can be replaced with another node. It is possible to reduce the DFG and realize the processing speed of the processing apparatus 10.

リコンフィギュラブル回路１２におけるＡＬＵの配列は、縦方向にのみ接続を許した多段配列に限らず、横方向の接続も許した、メッシュ状の配列であってもよい。また、上記の説明では、段を飛ばして論理回路を接続する接続用結線は設けられていないが、このような段を飛ばす接続用結線を設ける構成としてもよい。 The arrangement of ALUs in the reconfigurable circuit 12 is not limited to a multistage arrangement that permits connection only in the vertical direction, but may be a mesh-like arrangement that permits connection in the horizontal direction. In the above description, connection lines for connecting logic circuits by skipping stages are not provided, but a connection connection for skipping such stages may be provided.

今回開示された実施の形態はすべての点で例示であって制限的なものではないと考えられるべきである。本発明の範囲は上記した説明ではなくて特許請求の範囲によって示され、特許請求の範囲と均等の意味および範囲内でのすべての変更が含まれることが意図される。 The embodiment disclosed this time should be considered as illustrative in all points and not restrictive. The scope of the present invention is defined by the terms of the claims, rather than the description above, and is intended to include any modifications within the scope and meaning equivalent to the terms of the claims.

実施の形態に係る処理装置の構成図である。It is a block diagram of the processing apparatus which concerns on embodiment. 生成すべきターゲット回路を分割してできる複数の回路の設定データについて説明するための図である。It is a figure for demonstrating the setting data of the some circuit which can divide | segment the target circuit which should be produced | generated. リコンフィギュラブル回路の一般的な構成を示す図である。It is a figure which shows the general structure of a reconfigurable circuit. データフローグラフの構造を説明するための図である。It is a figure for demonstrating the structure of a data flow graph. 前後７点を利用する７タップからなるＦＩＲフィルタ回路を示す図である。It is a figure which shows the FIR filter circuit which consists of 7 taps using the front and back 7 points. 図５で示すＦＩＲフィルタ回路を置き換えた回路を示す図である。FIG. 6 is a diagram showing a circuit in which the FIR filter circuit shown in FIG. 5 is replaced. 図６で示すＦＩＲフィルタ回路をさらに置き換えた回路を示す図である。FIG. 7 is a diagram showing a circuit in which the FIR filter circuit shown in FIG. 6 is further replaced. 図７に示すＦＩＲフィルタ回路をコンパイルして作成したデータフローグラフを示す図である。It is a figure which shows the data flow graph produced by compiling the FIR filter circuit shown in FIG. リコンフィギュラブル回路の一例を示す図である。It is a figure which shows an example of a reconfigurable circuit. 図８に示すデータフローグラフを、図９のリコンフィギュラブル回路を用いて実現する例を示す図である。It is a figure which shows the example which implement | achieves the data flow graph shown in FIG. 8 using the reconfigurable circuit of FIG. ｍビットの被乗数ｘとｎビットの乗数ｙの筆算アルゴリズムのフローチャートを示す図である。It is a figure which shows the flowchart of the writing algorithm of m-bit multiplicand x and n-bit multiplier y. ２進数表示した４ビット×４ビットの具体的な処理動作の一例を示す図である。It is a figure which shows an example of a specific processing operation of 4 bits x 4 bits displayed in binary number. ２進数表示した４ビット×４ビットの具体的な処理動作の別の例を示す図である。It is a figure which shows another example of the specific processing operation | movement of 4 bits x 4 bits displayed by the binary number. 筆算アルゴリズムを用いて被乗数２４ビット×乗数４ビットの計算をするＤＦＧを示す図である。It is a figure which shows DFG which calculates multiplicand 24 bits x multiplier 4 bits using a writing algorithm. 図１４に示すデータフローグラフにおいて、所定の規則を有するデータフローの一群を点線で囲った図である。FIG. 15 is a diagram in which a group of data flows having a predetermined rule is surrounded by a dotted line in the data flow graph shown in FIG. 14. 図１５に示すノード群のＤＦＧを示す図である。It is a figure which shows DFG of the node group shown in FIG. データフローグラフ処理部の構成を示す図である。It is a figure which shows the structure of a data flow graph process part. ノードの置換を行った後のＤＦＧを示す図である。It is a figure which shows DFG after performing replacement of a node. ノード群に含まれるノードと同一段に位置するｎｏｐノードを削除したＤＦＧを示す図である。It is a figure which shows DFG which deleted the nop node located in the same stage as the node contained in a node group. ｘ入力系統の２系統の１つにある符号反転ノード（−１×）を、同じ系統の最後に配置されているｎｏｐノードと入れ替えた状態を示す図である。It is a figure which shows the state which replaced the sign inversion node (-1x) in one of 2 systems of x input system with the nop node arrange | positioned at the end of the same system. 右側の系の２段目にある４ビット左シフト演算を同じ系の１段目のｎｏｐノードと入れ替えた後のＤＦＧを示す図である。It is a figure which shows DFG after replacing the 4 bit left shift operation in the 2nd step | paragraph of the right side system with the 1st nop node of the same type | system | group. ｘ入力系統を１系統に統合した後のＤＦＧを示す図である。It is a figure which shows DFG after integrating x input system into 1 system. 図２２のｙ系統の上２段分の条件処理部分をビット積演算ノードに置き換えた状態を示す図である。It is a figure which shows the state which replaced the condition processing part for the upper 2 steps | paragraphs of y system | strain of FIG. 22 with the bit product operation node. 被乗数８ビット×乗数８ビットの乗算のＤＦＧを示す図である。It is a figure which shows DFG of the multiplication of multiplicand 8 bits x multiplier 8 bits. 被乗数４ビット×乗数２４ビットの乗算のＤＦＧを示す図である。It is a figure which shows DFG of the multiplication of multiplicand 4 bits x multiplier 24 bits. 筆算アルゴリズムの別の例を示す図である。It is a figure which shows another example of a writing algorithm. 偶数桁と奇数桁に分割した場合のノード群のＤＦＧを示す図である。It is a figure which shows DFG of the node group at the time of dividing | segmenting into even-numbered digits and odd-numbered digits. 乗算演算をmul_tで実現したＤＦＧを示す図である。It is a figure which shows DFG which implement | achieved multiplication operation by mul_t. 乗算演算をmul_t2で実現したＤＦＧを示す図である。It is a figure which shows DFG which implement | achieved multiplication operation by mul_t2.

Explanation of symbols

１０・・・処理装置、１２・・・リコンフィギュラブル回路、１４・・・設定部、１８・・・制御部、２６・・・集積回路装置、３０・・・コンパイル部、３１・・・データフローグラフ処理部、３２・・・設定データ生成部、３４・・・記憶部、３６・・・プログラム、３８・・・データフローグラフ、４０・・・設定データ、５２・・・接続部、５３・・・ＡＬＵ列、６０・・・ノード群探索部、６１・・・ノード置換部、６２・・・ＤＦＧ再構成部、７０・・・ノード群。 DESCRIPTION OF SYMBOLS 10 ... Processing apparatus, 12 ... Reconfigurable circuit, 14 ... Setting part, 18 ... Control part, 26 ... Integrated circuit device, 30 ... Compile part, 31 ... Data Flow graph processing unit, 32 ... setting data generation unit, 34 ... storage unit, 36 ... program, 38 ... data flow graph, 40 ... setting data, 52 ... connection unit, 53 ... ALU sequence, 60 ... node group search unit, 61 ... node replacement unit, 62 ... DFG reconstruction unit, 70 ... node group.

Claims

A reconfigurable circuit having a multi-stage arrangement of logic circuits each capable of selectively executing a plurality of arithmetic functions, and a connection section for setting a connection relationship between an output of a preceding logic circuit and an input of a succeeding logic circuit A data flow graph processing apparatus for processing a data flow graph to be supplied to the data flow graph;
Means for generating a data flow graph expressing the dependency of execution order between operations based on a behavioral description describing the behavior of processing;
Means for replacing a group of data flows composed of basic operations in a reconfigurable circuit with one node;
Means for deleting a node located at the same stage when an operation is not assigned to a node located at the same stage as a node included in the group of data flows to be replaced;
A data flow graph processing apparatus comprising: