JPH1196018A

JPH1196018A - Compiling device, its method and computer-readable recording medium recording compiling execution program

Info

Publication number: JPH1196018A
Application number: JP9256594A
Authority: JP
Inventors: Masaki Arai; 正樹新井
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1997-09-22
Filing date: 1997-09-22
Publication date: 1999-04-09

Abstract

PROBLEM TO BE SOLVED: To execute instruction scheduling considering of a by-pass and to optimize the use of the by-pass on a confluent point of a control flow graph. SOLUTION: In the compiling device, an intermediate code conversion part 14 converts a VLIW instruction string e.g. inputted from a source program input part 12 into an intermediate code instruction string. An instruction scheduling part 16 prepares a control flow graph for the instruction string converted into the intermediate code and generates an extended resource reservation table 18 extended by registering by-pass information for specifying a by-pass for reading out an operation number and a source operand in a writing part of a resource reservation table in each reference block of the control flow graph. An assembly code conversion part 20 converts an instruction including by-pass specification information into an assembly code based on the operation number and the by-pass information registered in the table 18.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、命令をパイプライ
ンにより処理するプロセッサに用いられるプログラムを
コンパイルするコンパイル装置及び方法並びにコンパイ
ル実行プログラムを記録したコンピュータ読み取り可能
な記録媒体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a compiling apparatus and method for compiling a program used in a processor which processes instructions by a pipeline, and a computer-readable recording medium on which a compiling execution program is recorded.

【０００２】[0002]

【従来技術】近年、描画プロセッサ等にあっては画素単
位で行う膨大な数の幾何変換処理を高速で実行するた
め、ＶＬＩＷ命令をパイプライン処理より並列実行する
ようにしている。図２２は、並列パイプライン処理によ
り実行されるＶＬＩＷ命令である。１つのＶＬＩＷ命令
１は、並列に動作する例えば４個の命令エレメント２
Ａ，２Ｂ，２Ｃ，２Ｄから構成され、各命令エレメント
２Ａ〜２Ｄは例えば３０ビットの長さを持つ。ＶＬＩＷ
命令１の各命令エレメント２Ａ〜２Ｄの位置をスロット
と呼び、スロットＡ、スロットＢ、スロットＣ及びスロ
ットＤで表わす。2. Description of the Related Art In recent years, VLIW instructions are executed in parallel by pipeline processing in a drawing processor or the like in order to execute a huge number of geometric transformation processes performed on a pixel basis at a high speed. FIG. 22 shows a VLIW instruction executed by the parallel pipeline processing. One VLIW instruction 1 is, for example, four instruction elements 2 operating in parallel.
A, 2B, 2C, and 2D. Each instruction element 2A to 2D has a length of, for example, 30 bits. VLIW
The positions of the instruction elements 2A to 2D of the instruction 1 are called slots, and are represented by slots A, B, C, and D.

【０００３】１つのＶＬＩＷ命令１に含まれる４個の命
令エレメント２Ａ〜２Ｄは、４並列ＶＬＩＷパイプライ
ン演算部３に与えられて同時に実行される。図２３は、
４並列ＶＬＩＷパイプライン演算部３によるパイプライ
ンの流れであり、連続するＶＬＩＷ命令１，ＶＬＩＷ命
令２，ＶＬＩＷ命令３を例にとっている。ここでパイプ
ラインステージは、例えばスロットＡとスロットＣは、
Ｄ，Ｅ１，Ｅ２１，Ｗであり、スロットＢはＤ，Ｅ，
Ｎ，Ｗであり、スロットＤはＤ，Ｅ，Ｒ，Ｗであり、命
令エレメントによって異なるが最初は命令を解読するＤ
ステージであり、また最後はレジスタに書き込むＷステ
ージとなっており、その間に２つの演算ステージをもっ
ている。[0003] Four instruction elements 2A to 2D included in one VLIW instruction 1 are supplied to a 4-parallel VLIW pipeline operation unit 3 and are simultaneously executed. FIG.
This is the flow of the pipeline by the 4-parallel VLIW pipeline operation unit 3, taking continuous VLIW instruction 1, VLIW instruction 2, and VLIW instruction 3 as an example. Here, the pipeline stage, for example, slot A and slot C
D, E1, E21, W, and slot B is D, E,
N, W, and slot D is D, E, R, W, which depends on the instruction element, but is initially D which decodes the instruction.
This stage is a stage, and the last stage is a W stage for writing to a register, with two operation stages between them.

【０００４】このような４並列ＶＬＩＷプロセッサで実
行されるＶＬＩＷ命令列（ソースプログラム）は、コン
パイラでアセンブリコードに変換される際に、プログラ
ム中の命令を並び替えることで命令レベルの並列度を挙
げて実行速度を向上させるため、ＶＬＩＷプロセッサの
命令スケジューリングを次のように行う。対象プログラムをＶＬＩＷオペレーションの列に変換
する。A VLIW instruction sequence (source program) executed by such a four-parallel VLIW processor, when converted into assembly code by a compiler, increases the instruction level parallelism by rearranging the instructions in the program. In order to improve the execution speed, the instruction scheduling of the VLIW processor is performed as follows. The target program is converted into a VLIW operation sequence.

【０００５】ＶＬＩＷオペレーションの列に対して命
令スケジューリングを行う。命令スケジューリングの結果からＶＬＩＷ命令を作成
する。このような命令スケジューリングを行う場合には、図２
４に示すような資源予約表を使用する。資源予約表は、
第１リスト１００Ａと第２リスト１００Ｂをもつ。第１
リスト１００Ａには、ソースオペランドの値％ｒについ
て、実行サイクルｎ＋０，ｎ＋１，ｎ＋２，・・・の中
のレジスタ書込みの終了サイクルにオペレーション番号
を登録する。[0005] Instruction scheduling is performed for a sequence of VLIW operations. A VLIW instruction is created from the result of the instruction scheduling. When such instruction scheduling is performed, FIG.
The resource reservation table shown in FIG. The resource reservation table is
It has a first list 100A and a second list 100B. First
In the list 100A, for the value% r of the source operand, an operation number is registered in the register write end cycle in the execution cycles n + 0, n + 1, n + 2,.

【０００６】第２リスト１００Ｂは、ＶＬＩＷ命令の各
命令エレメントに対応したスロットＡ〜Ｄについて、第
１リスト１００Ａに登録された先行する命令のレジスタ
書込みの終了サイクルに、そのレジスタ書込み値をソー
スオペランドの値として使用する命令のオペレーション
番号を登録する。図２４の資源予約表のＶＬＩＷ命令１
〜３は、例えば図２５（Ａ）のアセンブリコードで表現
される。このアセンブリコードによるＶＬＩＷ命令１〜
３の意味は、図２５（Ｂ）のようになる。また図２４の
ＶＬＩＷ命令１〜３は、スロットＢの命令エレメントの
みであり、スロットＡ，Ｃ，Ｄは使用しないことからノ
ン・オペレーション命令「ｎｏｐ」を格納している。The second list 100B stores the register write values of the slots A to D corresponding to the respective instruction elements of the VLIW instruction in the register write end cycle of the preceding instruction registered in the first list 100A. Register the operation number of the instruction to be used as the value of. VLIW instruction 1 in the resource reservation table of FIG.
3 are represented by, for example, the assembly code of FIG. VLIW instructions 1 to 4 using this assembly code
The meaning of 3 is as shown in FIG. In addition, VLIW instructions 1 to 3 in FIG. 24 are only instruction elements of slot B and do not use slots A, C, and D, and therefore store a non-operation instruction "nop".

【０００７】[0007]

【発明が解決しようとする課題】しかしながら、このよ
うな従来のＶＬＩＷプロセッサによるパイプライン処理
にあっては、図２３のパイプラインの流れから明らかな
ように、あるＶＬＩＷ命令が先行するＶＬＩＷ命令の結
果をソースオペランドの値として使用する場合、先行す
るＶＬＩＷ命令のＷステージとなるレジスタ書込みサイ
クルが終了しなければ、次のＶＬＩＷ命令をパイプライ
ンに投入することができず、４ステージ分の空きサイク
ルを生じている。However, in such pipeline processing by the conventional VLIW processor, as is apparent from the flow of the pipeline in FIG. 23, a certain VLIW instruction results in a preceding VLIW instruction. Is used as the value of the source operand, the next VLIW instruction cannot be input to the pipeline unless the register write cycle that is the W stage of the preceding VLIW instruction is completed. Has occurred.

【０００８】この問題を解決するため、本願発明者等に
あっては、ソフトウェア・バイパス制御機構をもつプロ
セッサを提案している。ソフトウェア・バイパス制御機
構をもつプロセッサは、ハードウェアを簡素化するため
に、ＶＬＩＷ命令のソースオペランドの値を、レジスタ
あるいは複数のバイパス出力のいずれから読むかをソフ
トウェアで指定するという特徴をもったプロセッサであ
る。In order to solve this problem, the present inventors have proposed a processor having a software bypass control mechanism. A processor having a software bypass control mechanism has a feature that, in order to simplify hardware, a software specifies a value of a source operand of a VLIW instruction from a register or a plurality of bypass outputs. It is.

【０００９】このアーキテクチャをもつプロセッサで
は、各命令のソースオペランドの値をバイパスあるいは
レジスタから読む方法を、パイプラインのタイミングを
考慮しアセンブリコードで陽に指定する必要がある。ま
たソフトウェア・バイパス制御機構をもつプロセッサの
ためにＶＬＩＷオペレーションの命令スケジューリング
を行う場合、どのサイクルにどの命令がどのバイパスに
レジスタの値を出力したかの情報が必要になる。In the processor having this architecture, it is necessary to explicitly specify the method of bypassing or reading the value of the source operand of each instruction from the register in the assembly code in consideration of the timing of the pipeline. Also, when performing instruction scheduling of a VLIW operation for a processor having a software bypass control mechanism, information on which cycle output which register value to which instruction in which cycle is required.

【００１０】またコンパイラにあっては、一般に制御フ
ローグラフを作成し、その基本ブロック毎に命令スケジ
ューリングを行っているが、ソフトウェア・バイパス制
御機構をもつプロセッサでは、制御フローグラフの合流
点で、複数の先行基本ブロックが定義するレジスタの値
を使用する命令のレジスタ・バイパス指定をどのように
決定したらよいか、という問題が生じる。In a compiler, a control flow graph is generally created and instruction scheduling is performed for each basic block. However, in a processor having a software bypass control mechanism, a plurality of control flow graphs are set at a junction of the control flow graph. The problem arises as to how to determine the register bypass designation of the instruction using the register value defined by the preceding basic block.

【００１１】更に、制御フローグラフは一般に有向循環
グラフであるため、合流点の基本ブロックの命令スケジ
ューリングを行う時に、先行基本ブロックの資源予約表
を参照できない場合がある。本発明は、このような問題
点に鑑みてなされたもので、ソフトウェア・バイパス制
御機構をもつプロセッサについて、拡張された資源予約
表によりバイパスを考慮した命令スケジューリングを可
能とし、また制御フローグラフの基本ブロックのスケジ
ューリングの順序の決定により制御フローグラフの合流
点でのバイパスの使用を最適化するコンパイル装置及び
方法並びにコンパイル実行プログラムを記録したコンピ
ュータ読み取り可能な記録媒体を提供することを目的と
する。Further, since the control flow graph is generally a directed circulation graph, it may not be possible to refer to the resource reservation table of the preceding basic block when performing instruction scheduling of the basic block at the junction. The present invention has been made in view of such a problem, and enables an instruction scheduling considering a bypass for a processor having a software bypass control mechanism by using an extended resource reservation table. It is an object of the present invention to provide a compiling apparatus and method for optimizing use of a bypass at a junction of a control flow graph by determining a block scheduling order, and a computer-readable recording medium recording a compile execution program.

【００１２】[0012]

【課題を解決するための手段】図１は本発明の原理説明
図である。まず本発明は、図１（Ａ）のように、例えば
ＶＬＩＷ命令を分割した複数の命令エレメントをパイプ
ライン処理により並列的に実行し、このパイプライン処
理において先行命令の演算結果をレジスタに書き込む前
にバイパスに出力して後続命令のソースオペランドとし
て使用可能とし、各命令のソースオペランドをレジスタ
から読むか又は複数のバイパス出力のいずれから読むか
をアセンブリコードで陽に指定するバイパス制御機構付
きのプロセッサで実行するコンパイル装置を提供する。FIG. 1 is a diagram illustrating the principle of the present invention. First, according to the present invention, as shown in FIG. 1A, for example, a plurality of instruction elements obtained by dividing a VLIW instruction are executed in parallel by pipeline processing, and in this pipeline processing, the operation result of the preceding instruction is written in a register. A processor with a bypass control mechanism that outputs to a bypass to be used as a source operand of a subsequent instruction, and explicitly specifies in assembly code whether to read a source operand of each instruction from a register or one of a plurality of bypass outputs. Provide a compiling device to be executed by.

【００１３】本発明のコンパイル装置は、図１（Ｂ）の
ように、例えばＶＬＩＷ命令で記述された命令列である
ソースプログラムを入力するソースプログラム入力部１
２、ソースプログラム入力部１２で入力した命令列を中
間コードの命令列に変換する中間コード変換部１４、図
１（Ｄ）のような中間コードに変換された命令列の制御
フローグラフを作成して制御フローグラフの基本ブロッ
ク単位に、資源予約表の書込み部分に、オペレーション
番号とソースオペランドを読み込むバイパスを指定する
バイパス情報を登録して拡張された拡張資源予約表１８
を生成する命令スケジューリング部１６、命令スケジュ
ーリング部１６で作成した拡張資源予約表１８のオペレ
ーション番号とバイパス情報に基づいて、バイパス指定
情報を含む命令のアセンブリコードに変換するアセンブ
リコード変換部２０、及びアセンブリコード変換部２０
で変換された命令列のアセンブリコードを目的プログラ
ム２４として出力するアセンブリコード出力部２２備え
たことを特徴とする。As shown in FIG. 1B, the compiling apparatus according to the present invention comprises a source program input unit 1 for inputting a source program which is an instruction sequence described by, for example, a VLIW instruction.
2. An intermediate code conversion unit 14 that converts an instruction sequence input at the source program input unit 12 into an instruction sequence of an intermediate code, and creates a control flow graph of the instruction sequence converted into the intermediate code as shown in FIG. The extended resource reservation table 18 extended by registering bypass numbers for specifying bypasses for reading the operation number and the source operand in the write portion of the resource reservation table for each basic block of the control flow graph.
, An assembly code conversion unit 20 that converts an instruction including bypass designation information into assembly code based on the operation number and bypass information of the extended resource reservation table 18 created by the instruction scheduling unit 16, and an assembly Code conversion unit 20
And an assembly code output unit 22 for outputting the assembly code of the instruction sequence converted in the above as a target program 24.

【００１４】命令スケジューリング部１６で生成する拡
張資源予約表１８は、図１（Ｃ）のように、ソースオペ
ランドの値％ｒについて、実行サイクル毎にオペレーシ
ョン番号と演算結果が出力されるバイパス情報の組を登
録する第１リスト１８Ａと、ＶＬＩＷ命令の各命令エレ
メントに対応したスロットＢについて、実行サイクル毎
に第１リスト１８Ａから得られたバイパス情報をオペレ
ーション番号と共に登録した第２リスト１８Ｂを有す
る。As shown in FIG. 1C, the extended resource reservation table 18 generated by the instruction scheduling unit 16 stores the operation number and the operation result for each execution cycle for the value% r of the source operand. It has a first list 18A for registering a set and a second list 18B for registering bypass information obtained from the first list 18A together with an operation number for each execution cycle for the slot B corresponding to each instruction element of the VLIW instruction.

【００１５】命令スケジューリング部１６は、図１
（Ｄ）のように、制御フローグラフの中のスケジューリ
ングが終了していない基本ブロックの集合から例えばト
レース（Ｃ・Ｄ・Ｆ）を選択してスケジューリング対象
集合Ｔ（Ｃ・Ｄ・Ｆ）とする。更にトレース（Ｃ・Ｄ・
Ｆ）に合流するエッジａ２、ｂをもち、且つスケジュー
リングが終了していない基本ブロックＡ，Ｂが存在する
場合は、この基本ブロックＡ，Ｂをトレース（Ｃ・Ｄ・
Ｆ）に追加してスケジューリング対象集合Ｔ（Ｃ・Ｄ・
Ｆ・Ａ・Ｂ）とし、スケジューリング対象集合Ｔ（Ｃ・
Ｄ・Ｆ・Ａ・Ｂ）毎に基本ブロックのスケジューリング
を行う。The instruction scheduling unit 16 is configured as shown in FIG.
As shown in (D), for example, a trace (CDF) is selected from a set of basic blocks for which scheduling has not been completed in the control flow graph, and is set as a scheduling target set T (CDF). . Trace (CD ・ D ・
If there are basic blocks A and B that have edges a2 and b that join F) and scheduling has not been completed, these basic blocks A and B are traced (CDD).
F) in addition to the scheduling target set T (CD
F · A · B) and the scheduling target set T (C · C
The basic block is scheduled for each of DFAB.

【００１６】命令スケジューリング部１６は、スケジュ
ーリング対象集合Ｔに含まれる複数の基本ブロックにつ
いて、先行基本ブロックの資源予約表を参照できない場
合を回避するため、先行基本ブロックが存在しない基本ブロック、全ての先行基本ブロックのスケジューリングが終了し
ているが自己のスケジューリングが終了していない基本
ブロック、先行基本ブロックにスケジューリングが終了した基本
ブロックをもち、且つスケジューリングが終了していな
い先行基本ブロックの数が最も少ない基本ブロック、を順次選択して基本ブロックのスケジューリングを繰り
返す。The instruction scheduling unit 16 is used to avoid a case where the resource reservation table of the preceding basic block cannot be referred to for a plurality of basic blocks included in the scheduling target set T. A basic block for which scheduling of a basic block has been completed but its own scheduling has not been completed. A basic block which has a basic block for which scheduling has been completed for the preceding basic block and has the least number of preceding basic blocks for which scheduling has not been completed. The block and are sequentially selected to repeat the basic block scheduling.

【００１７】命令スケジューリング部１６は、スケジュ
ーリング対象集合Ｔ例えば集合Ｔ（Ｃ・Ｄ・Ｆ・Ａ・
Ｂ）に含まれる複数の基本ブロックについて、制御フロ
ーグラフ合流点の最適化のため、合流点の基本ブロック
Ｄに対し先行する直前の複数の基本ブロックＢ，Ｃの命
令スケジューリングが全て終了している場合には、合流
点の基本ブロックＤの拡張資源予約表１６を参照してス
ケジューリングする。The instruction scheduling unit 16 includes a set T to be scheduled, for example, a set T (CDFAA.
For the plurality of basic blocks included in B), the instruction scheduling of the plurality of basic blocks B and C immediately before the basic block D at the merge point has been completed in order to optimize the merge point of the control flow graph. In this case, scheduling is performed with reference to the extended resource reservation table 16 of the basic block D at the junction.

【００１８】また合流点の基本ブロックＤのスケジュー
リングで、先行する複数の基本ブロックＢ，Ｃが定義す
るレジスタの値を使用するオペレーションが合流点の基
本ブロックＤに存在する場合には、先行する基本ブロッ
クＢ又はＣの空きスロットにレジスタ間のムーブ命令を
生成することで、合流の基本ブロックＤの入口でのバイ
パスのタイミングを合わせ、バイパスのタイミングを合
わせることができない場合には、オペランドの値をレジ
スタから読むようにスケジューリングする。In the scheduling of the basic block D at the junction, if an operation using the register value defined by a plurality of preceding basic blocks B and C exists in the basic block D at the junction, By generating a move instruction between registers in an empty slot of the block B or C, the timing of bypass at the entrance of the merging basic block D is adjusted. If the timing of bypass cannot be adjusted, the value of the operand is changed. Schedule to read from register.

【００１９】命令スケジューリング部１６は、スケジュ
ーリング対象集合Ｔに含まれる複数の基本ブロック、例
えば基本ブロックＡ，Ｂ，Ｃ，Ｄ，Ｆについて、制御フ
ローグラフ合流点の最適化のため、合流点の基本ブロッ
クＤに対し先行する複数の基本ブロックＢ，Ｃの命令ス
ケジューリングが終了していない場合には、命令スケジ
ューリングが終了していない先行基本ブロックＢ，Ｃ以
前の基本ブロックが定義するオペランドを使用するオペ
レーションは、レジスタへの書込みが完了するサイクル
以降に実行されるようにスケジューリングする。The instruction scheduling unit 16 performs a basic operation on a plurality of basic blocks included in the scheduling target set T, for example, the basic blocks A, B, C, D, and F, in order to optimize the control flow graph junction. If the instruction scheduling of a plurality of basic blocks B and C preceding the block D is not completed, an operation using an operand defined by a basic block before the preceding basic blocks B and C whose instruction scheduling has not been completed. Is scheduled to be executed after the cycle in which writing to the register is completed.

【００２０】ここで、任意の基本ブロックＸ以前の基本
ブロック」とは、基本ブロックＸと基本ブロックＸに制
御が到達する基本ブロックを意味する。このため先行基
本ブロックＢ，Ｃ以前の基本ブロックとは、基本ブロッ
クＡ，Ｂ，Ｃ，Ｄ，Ｆを意味する。基本ブロックＤ，Ｆ
が含まれるのは、ループの形成により基本ブロックＤに
とって基本ブロックＦが先行基本ブロックとなり、また
基本ブロックＦにとっては、基本ブロックＤが先行基本
ブロックとなるからである。Here, "a basic block before any basic block X" means a basic block X and a basic block whose control reaches the basic block X. Therefore, the basic blocks before the preceding basic blocks B and C mean the basic blocks A, B, C, D and F. Basic blocks D and F
Is included because the basic block F becomes the preceding basic block for the basic block D due to the formation of the loop, and the basic block D becomes the preceding basic block for the basic block F.

【００２１】それ以外のレジスタへの書込みに関して
は、先行基本ブロックＢ又はＣの空きスロットにレジス
タ間のムーブ命令を生成することで、合流点の基本ブロ
ックＤの入口でのバイパスのタイミングを合わせ、バイ
パスのタイミングを合わせることができない場合には、
オペランドの値をレジスタから読むようにスケジューリ
ングする。For writing to the other registers, a move instruction between registers is generated in an empty slot of the preceding basic block B or C, so that the bypass timing at the entrance of the basic block D at the junction is adjusted. If you ca n’t match the bypass timing,
Schedule the operand value to be read from a register.

【００２２】また本発明は、ソフトウェア・バイパス制
御機構を備えたプロセッサで実行するコンパイル方法を
提供する。このコンパイル方法は、ソースファイルから命令で記述された命令列を入力す
るソースプログラム入力過程；ソースファイル入力過程で入力した命令列を中間コー
ドの命令列に変換する中間コード変換過程；中間コードに変換された命令列の制御フローグラフを作
成し、制御フローグラフの基本ブロック単位に、資源予
約表の書込み部分に、オペレーション番号とソースオペ
ランドを読み込むバイパスを指定するバイパス情報を登
録して拡張資源予約表１８を生成する命令スケジューリ
ング過程；命令スケジューリング過程で作成した拡張資
源予約表１８のオペレーション番号とバイパス情報に基
づいて、バイパス指定情報を含む命令のアセンブリコー
ドに変換するアセンブリコード変換過程、を備える。こ
の方法の詳細は、装置の場合と同じである。The present invention also provides a compiling method executed by a processor having a software bypass control mechanism. This compiling method includes a source program input step of inputting an instruction sequence described by an instruction from a source file; an intermediate code conversion step of converting the instruction sequence input in the source file input step into an intermediate code instruction sequence; A control flow graph of the instruction sequence is created, and in a basic block unit of the control flow graph, an operation number and bypass information for specifying a bypass for reading a source operand are registered in a write portion of the resource reservation table, and an extended resource reservation table is registered. And an assembly code conversion step of converting an instruction including bypass designation information into an assembly code based on the operation number and the bypass information of the extended resource reservation table 18 created in the instruction scheduling step. The details of this method are the same as those of the apparatus.

【００２３】アセンブリコード変換過程で変換された命
令列のアセンブリコードを出力するアセンブリコード出
力過程、を備える。このように本発明によれば、ソフト
ウェア・バイパス制御機構をもつプロセッサに対して、
拡張資源管理表を使用することにより、バイパスを考慮
した命令スケジューリングを行うことができる。An assembly code output step of outputting an assembly code of the instruction sequence converted in the assembly code conversion step is provided. Thus, according to the present invention, for a processor having a software bypass control mechanism,
By using the extended resource management table, instruction scheduling in consideration of bypass can be performed.

【００２４】また本発明によれば、ソフトウェア・バイ
パス制御機構をもつプロセッサに対して、ソフトウェア
・バイパス制御機構をもつプロセッサに特有の問題であ
る制御フローグラフの合流点でのバイパスの使用を、制
御フローグラフの基本ブロックのスケジューリングの順
序を決定することによって、効率良く行うことを可能に
する。According to the present invention, the use of the bypass at the junction of the control flow graph, which is a problem unique to the processor having the software bypass control mechanism, is controlled for the processor having the software bypass control mechanism. By determining the order of scheduling the basic blocks of the flow graph, it is possible to perform the scheduling efficiently.

【００２５】更に本発明は、命令を分割した複数の命令
エレメントをパイプライン処理により並列的に実行し、
パイプライン処理において先行命令の演算結果をレジス
タに書き込む前にバイパスに出力して後続命令のソース
オペランドとして使用可能とし、各命令のソースオペラ
ンドをレジスタから読むか又は複数のバイパス出力のい
ずれかから読むかをアセンブリコードで陽に指定するバ
イパス制御機構付きのプロセッサで実行する命令列をコ
ンパイルするコンパイル実行プログラムを記録したコン
ピュータ読み取り可能な記録媒体を提供する。Further, according to the present invention, a plurality of instruction elements obtained by dividing an instruction are executed in parallel by pipeline processing,
In pipeline processing, the operation result of the preceding instruction is output to a bypass before being written to a register and can be used as a source operand of a subsequent instruction, and the source operand of each instruction is read from a register or one of a plurality of bypass outputs. Provided is a computer-readable recording medium that stores a compile execution program that compiles an instruction sequence to be executed by a processor having a bypass control mechanism that explicitly specifies the above in an assembly code.

【００２６】このコンパイル実行プログラムを記録した
コンピュータ読み取り可能な記録媒体は、ＶＬＩＷ命令
で記述された命令列のソースプログラムを入力するソー
プログラム入力部と、記ソースプログラム入力部で入力
した命令列を中間コードの命令列に変換する中間コード
変換部と、中間コードに変換された命令列の制御フロー
グラフを作成し、制御フローグラフの基本ブロック単位
に、資源予約表の書込部分に、オペレーション番号とソ
ースオペランドを読み込むバイバスを指定するバイパス
情報を登録して拡張資源予約表を生成する命令スケジュ
ーリング部と、命令スケジューリング部で作成した拡張
資源予約表のオペレーション番号とバイパス情報に基づ
いて、バイパス情報を含む命令のアセンブリコードに変
換するアセンブリコード変換部と、アセンブリコード変
換部で変換された命令列の目的プログラムを出力するア
センブリコード出力部と、を備えたことを特徴とする。A computer-readable recording medium on which the compile execution program is recorded has a saw program input unit for inputting a source program of an instruction sequence described by a VLIW instruction, and an instruction sequence input on the source program input unit. An intermediate code conversion unit for converting the instruction sequence into a code instruction sequence, and a control flow graph for the instruction sequence converted into the intermediate code, and an operation number and An instruction scheduling unit for registering bypass information specifying a bypass for reading a source operand to generate an extended resource reservation table, and including bypass information based on the operation number and bypass information of the extended resource reservation table created by the instruction scheduling unit Assembly to be converted to instruction assembly code Wherein the over de conversion unit, and the assembly code output unit for outputting the object program of the converted instruction sequence in assembly code conversion unit, further comprising: a.

【００２７】[0027]

【発明の実施の形態】図２は本発明によるコンパイル装
置の機能ブロック図である。本発明のコンパイル装置
は、ソースプログラム入力部１２、中間コード変換部１
４、命令スケジューリング部１６、アセンブリコード変
換部２０、アセンブリコード出力部２２で構成される。
ソースプログラム入力部１２は、Ｃ言語等のプログラミ
ング言語で書かれたソースプログラム１０を入力する。FIG. 2 is a functional block diagram of a compiling device according to the present invention. The compiling device of the present invention includes a source program input unit 12, an intermediate code conversion unit 1,
4, an instruction scheduling unit 16, an assembly code conversion unit 20, and an assembly code output unit 22.
The source program input unit 12 inputs the source program 10 written in a programming language such as C language.

【００２８】即ち、プログラミング言語で書かれたソー
スプログラム１０をトークンと呼ばれる単語ごとに区切
り、その単語の正当性をチェックする。ソースプログラ
ミング入力部１２でプログラミング言語の正当性のチェ
ックが済んだソースプログラムは、次の中間コード変換
部１４で中間コードに変換される。この中間コード変換
部１４による中間コードへの返還は次の手順で実行され
る。That is, the source program 10 written in the programming language is divided into words called tokens, and the validity of the words is checked. The source program for which the validity of the programming language has been checked by the source programming input unit 12 is converted into an intermediate code by the next intermediate code conversion unit 14. The return to the intermediate code by the intermediate code conversion unit 14 is executed in the following procedure.

【００２９】ソースプログラムをトークンに切り分け
る。トークンの並びに対して構文解析を行う。即ちプログ
ラミング言語の文法に合致しているか否かをチェックす
る。トークンの並びが正しければ、その意味を表わす中間
コードを構文解析の結果に基づいて生成する。The source program is divided into tokens. Parse a sequence of tokens. That is, it checks whether or not it conforms to the grammar of the programming language. If the sequence of the tokens is correct, an intermediate code representing the meaning is generated based on the result of the syntax analysis.

【００３０】中間コードに変換されたソースプログラム
は命令スケジューリング部１６に出力される。本発明に
あっては、この命令スケジューリングにおいて制御フロ
ーグラフを生成してソースプログラムの実行手順を決め
る。更に本発明のコンパイル装置にあっては、命令のソ
ースオペランドの値をレジスタから読むかまたは複数の
バイパス出力のいずれかから読むかをアセンブリコード
で陽に指定するバイパス制御機構を備えたプロセッサで
実行するＶＬＩＷ命令列のコンパイルであることから、
従来のＶＬＩＷ命令列のコンパイルに使用している資源
予約表の機能拡張を図った拡張資源予約表１８を生成す
る処理を行う。The source program converted into the intermediate code is output to the instruction scheduling section 16. According to the present invention, in this instruction scheduling, a control flow graph is generated to determine the execution procedure of the source program. Further, in the compiling apparatus of the present invention, execution by a processor having a bypass control mechanism that explicitly specifies in assembly code whether to read the value of the source operand of an instruction from a register or one of a plurality of bypass outputs Compiles the VLIW instruction sequence
A process is performed to generate an extended resource reservation table 18 in which functions of the resource reservation table used for compiling the conventional VLIW instruction sequence are extended.

【００３１】この拡張資源予約表１８は、中間コードに
変換された命令列の制御フローグラフを作成した後に、
制御フローグラフの基本ブロック単位の資源予約表の書
込み部分に、従来のオペレーション番号に加えて本発明
にあっては新たにソースオペランドの値を読み込むバイ
パスを指定するバイパス情報を登録して、拡張された資
源予約表を生成している。The extended resource reservation table 18 is obtained by creating a control flow graph of an instruction sequence converted into an intermediate code,
In the write portion of the resource reservation table for each basic block of the control flow graph, in addition to the conventional operation number, in the present invention, bypass information for specifying a bypass for newly reading the value of the source operand is registered and expanded. A resource reservation table is created.

【００３２】命令スケジューリング部１６で中間コード
に変換されたソースプログラムの命令列について作成さ
れた制御フローグラフにおける基本ブロック単位ごとの
拡張資源予約表１８の生成が済むと、次のアセンブリコ
ード変換部２０において、命令スケジューリング部１６
で作成された拡張資源予約表１８のオペレーション番号
とバイパス指定情報に基づいて、バイパス指定情報を含
むＶＬＩＷ命令のアセンブリコードに変換する。When the extended resource reservation table 18 for each basic block unit in the control flow graph created for the instruction sequence of the source program converted into the intermediate code by the instruction scheduling unit 16 is completed, the next assembly code conversion unit 20 In the instruction scheduling unit 16
Is converted into an assembly code of a VLIW instruction including bypass designation information based on the operation number and bypass designation information of the extended resource reservation table 18 created in step (1).

【００３３】最終的にアセンブリコード出力部２２で中
間コードを、ターゲットであるソフトウェア・バイパス
制御機構を備えたプロセッサで実行可能なアセンブリコ
ードに変換された目的プログラム２４として対象プロセ
ッサのＲＯＭやディスクドライブの媒体等に出力する。
次に本発明のコンパイル装置及び方法が対象とするソフ
トウェア・バイパス制御機構を備えたプロセッサを説明
する。Finally, the assembly code output unit 22 converts the intermediate code into a target program 24 which is converted into an assembly code executable by a processor having a software bypass control mechanism as a target. Output to media etc.
Next, a description will be given of a processor having a software bypass control mechanism to which the compiling device and method of the present invention are applied.

【００３４】図３は本発明が対象とするソフトウェア・
バイパス制御機構を備えたプロセッサのブロック図であ
り、命令実行部２６、入力部として機能するバスユニッ
トインタフェース２８、コンパイルされたＶＬＩＷ命令
列のソースプログラムを格納したコードＲＡＭ３０、及
び出力部として機能するバスユニットインタフェース３
２で構成される。FIG. 3 is a diagram showing software and software to which the present invention is applied.
FIG. 4 is a block diagram of a processor having a bypass control mechanism, which includes an instruction execution unit 26, a bus unit interface 28 functioning as an input unit, a code RAM 30 storing a source program of a compiled VLIW instruction sequence, and a bus functioning as an output unit Unit interface 3
It consists of two.

【００３５】命令実行部２６は、例えば幾何学変換処理
を実行するもので、４並列ＶＬＩＷ方式の命令実行形式
としている。命令実行部２６には１６エントリ３２ビッ
トの整数レジスタ４０、３２エントリ３２ビットの浮動
小数点レジスタ３８，４２、３２ビットの浮動小数点演
算パイプライン４４，４６，５２，５４、３２ビットの
整数演算パイプライン４８、アドレス計算用の１６ビッ
トの加算器５０を備えている。更に、２個に分かれたデ
ータＲＡＭ３４−１，３４−２を有し、このデータＲＡ
Ｍ３４−１，３４−２へのロード／ストアを６４ビット
幅で並列的に行うことができる。The instruction execution unit 26 executes, for example, a geometric transformation process, and has a 4-parallel VLIW instruction execution format. The instruction execution unit 26 includes a 16-entry 32-bit integer register 40, a 32-entry 32-bit floating-point register 38, 42, a 32-bit floating-point arithmetic pipeline 44, 46, 52, 54, and a 32-bit integer arithmetic pipeline. 48, a 16-bit adder 50 for address calculation. Further, it has two separate data RAMs 34-1 and 34-2.
Load / store to M34-1 and M4-2 can be performed in parallel with a 64-bit width.

【００３６】図３のソフトウェア・バイパス制御機構付
のプロセッサで実行されるＶＬＩＷ命令は、図２１の従
来例に示したように、１つのＶＬＩＷ命令１を４つの命
令エレメント２Ａ〜２Ｄに分けて構成しており、各エレ
メント２Ａ〜２Ｄの位置をスロットと言い、それぞれス
ロットＡ，スロットＢ，スロットＣ，スロットＤとして
おり、１つのＶＬＩＷ命令１に含まれる４個の命令エレ
メント２Ａ〜２Ｄは同時に実行される。The VLIW instruction executed by the processor having the software bypass control mechanism shown in FIG. 3 is constituted by dividing one VLIW instruction 1 into four instruction elements 2A to 2D as shown in the conventional example of FIG. The position of each of the elements 2A to 2D is called a slot, which is called a slot A, a slot B, a slot C, and a slot D, respectively. The four instruction elements 2A to 2D included in one VLIW instruction 1 execute simultaneously. Is done.

【００３７】図３のプロセッサの４並列ＶＬＩＷ方式の
アーキテクチャについては、複数種類の命令エレメント
が予め定義されており、各命令エレメントはその動作内
容によって幾つかの命令カテゴリに分けることができ
る。この命令カテゴリとしては、例えば浮動小数点演算
命令、浮動小数点比例命令、浮動小数点変換命令、整数
演算命令、整数比較命令、ロード／ストア命令、分岐命
令、ＦＲ転送命令、ＩＲ転送命令、制御命令、ｎｏｐ命
令に分けられる。In the four-parallel VLIW architecture of the processor shown in FIG. 3, a plurality of types of instruction elements are defined in advance, and each instruction element can be divided into several instruction categories according to the operation contents. The instruction categories include, for example, a floating-point operation instruction, a floating-point proportional instruction, a floating-point conversion instruction, an integer operation instruction, an integer comparison instruction, a load / store instruction, a branch instruction, an FR transfer instruction, an IR transfer instruction, a control instruction, and a nop. Divided into instructions.

【００３８】また各カテゴリに属する命令エレメントを
配置できるスロットＡ〜Ｄの位置は決まっており、例え
ばスロットＢには整数演算命令、整数比較命令及びロー
ド／ストア命令しか置くことができず、これを他のスロ
ットにおくことはできない。本発明のコンパイル装置が
対象とするソフトウェア・バイパス制御機構を備えたプ
ロセッサにあっては、命令のパイプラインレベルの動作
がアーキテクチャの定義に含まれる点に大きな特徴をも
っている。換言すると、パイプライン動作を理解しない
と正しいＶＬＩＷ命令のプログラムを書くことができな
い。ほとんど全ての命令はソースオペランドの値として
何らかのデータを入力して処理を行い、結果を出力す
る。The positions of slots A to D in which instruction elements belonging to each category can be arranged are fixed. For example, in slot B, only integer operation instructions, integer comparison instructions and load / store instructions can be placed. It cannot be placed in another slot. The processor provided with the software bypass control mechanism to which the compiling device of the present invention is applied is characterized in that the operation at the pipeline level of the instruction is included in the definition of the architecture. In other words, it is not possible to write a correct VLIW instruction program without understanding the pipeline operation. Almost all instructions process by inputting some data as the value of the source operand, and output the result.

【００３９】本発明が対象とするソフトウェア・バイパ
ス制御機構付のプロセッサにあっては、命令のパイプラ
インレベルの定義において、処理内容のみならず、各命
令がどのパイプラインステージで入力や出力を行うかが
定められている。またデータの入力元や出力先はレジス
タファイルやメモリといった記憶領域に限らず、先行命
令の演算結果をレジスタを書き込む前に後続命令のソー
スオペランドとして用いるバイパスを設けており、ソー
スオペランドをバイパスから入力したり、演算結果をバ
イパスに出力したりする場合がある。即ち本発明が対象
とするプロセッサにあっては、バイパスも１つのリ・ソ
ースとして捉えなければならない。In a processor with a software bypass control mechanism to which the present invention is applied, in the definition of the pipeline level of an instruction, not only the processing content but also the input and output of each pipeline stage in each instruction are performed. Has been determined. In addition, the input source and output destination of data are not limited to storage areas such as register files and memories.A bypass is provided to use the operation result of the preceding instruction as the source operand of the subsequent instruction before writing the register. Or output the operation result to the bypass. That is, in the processor targeted by the present invention, the bypass must be regarded as one resource.

【００４０】図４は図３のソフトウェア・バイパス制御
機構付プロセッサのパイプライン動作によるパイプライ
ンの流れを示している。１つのＶＬＩＷ命令を構成する
４つの各命令エレメントは、必要な処理を複数のステッ
プに分けて実行する。分割されたステップの１つ１つが
パイプラインステージであり、１つのパイプラインステ
ージが１サイクルで実行される。FIG. 4 shows the flow of the pipeline by the pipeline operation of the processor with the software bypass control mechanism of FIG. Each of the four instruction elements constituting one VLIW instruction executes necessary processing in a plurality of steps. Each of the divided steps is a pipeline stage, and one pipeline stage is executed in one cycle.

【００４１】スロットＡ〜Ｄの各パイプラインステージ
には名前が付いている。例えばスロットＡとスロットＣ
はＤ，Ｅ１，Ｅ２，Ｗであり、スロットＢはＤ，Ｅ，
Ｎ，Ｗであり、スロットＤはＤ，Ｅ，Ｒ，Ｗである。こ
のようにパイプラインステージはスロットに対する命令
の種類によって若干の相違があるが、どの種類の命令も
最初のパイプラインステージは命令解読のＤステージで
あり、最後はレジスタ書込みのＷステージとなり、その
間に２つの演算ステージを設けている。Each pipeline stage in slots A to D has a name. For example, slot A and slot C
Are D, E1, E2, W, and slot B is D, E, E
N, W, and slot D is D, E, R, W. As described above, the pipeline stages are slightly different depending on the type of the instruction for the slot, but for any type of instruction, the first pipeline stage is the D stage for instruction decoding, and the last is the W stage for register writing. Two operation stages are provided.

【００４２】図４のパイプラインの流れから明らかなよ
うに、例えば最初のＶＬＩＷ命令１に含まれる４個の命
令エレメントは、サイクルｎ＋０で同時にＤステージに
投入される。そしてサイクルｎ＋１，サイクルｎ＋２と
進行した後に、最終的にサイクルｎ＋３でレジスタ書込
みのＷステージの最終ステージとなる。ＶＬＩＷ命令１
に続く次のＶＬＩＷ命令２は、必ずサイクル的に連続し
ており、サイクルｎ＋１でＤステージに投入される。後
続するＶＬＩＷ命令３，ＶＬＩＷ命令４，ＶＬＩＷ命令
５，・・・についても同様である。このため、続けて実
行されるＶＬＩＷ命令１，２，３，４，・・・の間に
は、図２２に示した従来のプロセッサのパイプラインの
流れのように４ステージ分の空きは生ずることがなく、
その分、高速で命令を実行することができる。As is apparent from the flow of the pipeline in FIG. 4, for example, four instruction elements included in the first VLIW instruction 1 are simultaneously input to the D stage in cycle n + 0. After proceeding with cycle n + 1 and cycle n + 2, the final stage of the W stage for register writing is finally provided in cycle n + 3. VLIW instruction 1
, The next VLIW instruction 2 is always continuous in a cycle, and is input to the D stage at cycle n + 1. The same applies to the following VLIW instruction 3, VLIW instruction 4, VLIW instruction 5, .... Therefore, there is a gap of four stages between the successively executed VLIW instructions 1, 2, 3, 4,... As in the flow of the pipeline of the conventional processor shown in FIG. Without
Accordingly, instructions can be executed at high speed.

【００４３】図５は図３のプロセッサにおけるスロット
Ｂの動作ブロックを取り出しており、整数演算パイプラ
イン４８、ロード／ストアの対象となるデータＲＡＭを
表したデータＲＡＭブロック３４、浮動小数点レジスタ
３８、整数レジスタ４０から構成される。整数演算パイ
プライン４８で計算された結果は、規定されたタイミン
グに従って出力パス４８ａ、Ｅバイパス４８ｂ、Ｎバイ
パス４８ｃ、Ｗバイパス４８ｄ及びアドレスパス４８ｅ
に出力される。出力パス４８ａは整数レジスタ４０への
書込みデータとなり、Ｅバイパス４８ｂ，Ｎバイパス４
８ｃ及びＷバイパス４８ｄはバイパスとしてソースオペ
ランドの候補になる。またアドレスパス４８ｄの値はア
ドレス情報としてデータＲＡＭブロック３４に通知され
る。FIG. 5 shows the operation block of slot B in the processor of FIG. 3, and includes an integer operation pipeline 48, a data RAM block 34 representing a data RAM to be loaded / stored, a floating point register 38, and an integer. It comprises a register 40. The result calculated by the integer operation pipeline 48 is output to an output path 48a, an E bypass 48b, an N bypass 48c, a W bypass 48d, and an address path 48e in accordance with prescribed timing.
Is output to The output path 48a becomes write data to the integer register 40, and the E bypass 48b and the N bypass 4
8c and W bypass 48d are candidates for source operands as bypasses. The value of the address path 48d is notified to the data RAM block 34 as address information.

【００４４】整数演算パイプライン４８は２個のソース
オペランドをとり、それぞれソースオペランド６０Ａ，
６０Ｂのようにグループ化される。ソースオペランド６
０Ａの候補としては、整数レジスタ４０からの入力パス
４０ａ、Ｅバイパス４８ｂ、Ｎバイパス４８ｃ、Ｗバイ
パス４８ｄ、ロードバイパス６４があり、これらの中か
ら１つが選ばれてソースオペランド６０Ａとなる。The integer operation pipeline 48 takes two source operands, each of which has a source operand 60A,
Grouped like 60B. Source operand 6
As candidates for 0A, there are an input path 40a from the integer register 40, an E bypass 48b, an N bypass 48c, a W bypass 48d, and a load bypass 64, and one of these is selected as the source operand 60A.

【００４５】またソースオペランド６０Ｂの候補として
は、整数レジスタ４０からの入力パス４０ｂ、Ｅバイパ
ス４８ｂ、Ｎバイパス４８ｃ，Ｗバイパス４８ｄ，測値
パス６２及びロードバイパス６４があり、これらの中か
ら１つが選ばれてソースオペランド６０Ｂとなる。図５
のスロットＢで実行される命令エレメントは、整数演算
命令、整数比例命令、ロード／ストア命令のいずれかで
ある。図６はその内の整数演算命令を示しており、Ｄ，
Ｅ，Ｎ，Ｗの４段のパイプラインステージで実行され
る。即ちＤステージでソースオペランドを読み込み、Ｅ
ステージで演算結果をＥバイパス４８ｂに出力し、Ｎス
テージで演算結果をＮバイパス４８ｃに出力し、最終的
にＷステージで演算結果をＷバイパス４８ｄに出力し、
且つ演算結果を整数レジスタ４０に書き込む。この点は
整数比較命令及びロード／ストア命令についても基本的
に同じである。As candidates for the source operand 60B, there are an input path 40b from the integer register 40, an E bypass 48b, an N bypass 48c, a W bypass 48d, a measurement path 62, and a load bypass 64, and one of these is one of them. Selected as source operand 60B. FIG.
Is an integer operation instruction, an integer proportional instruction, or a load / store instruction. FIG. 6 shows an integer operation instruction among them.
It is executed in four pipeline stages of E, N and W. That is, the source operand is read in the D stage,
The stage outputs the operation result to the E bypass 48b, the N stage outputs the operation result to the N bypass 48c, and finally the W stage outputs the operation result to the W bypass 48d.
In addition, the operation result is written into the integer register 40. This point is basically the same for the integer comparison instruction and the load / store instruction.

【００４６】図７は図３に示したソフトウェア・バイパ
ス制御機構を備えたプロセッサで実行される図２の本発
明によるアセンブリ装置の全体的な処理動作のフローチ
ャートである。このフローチャートのステップＳ１〜Ｓ
５は、図２の機能ブロックにおけるソースプログラム入
力部１２、中間コード変換部１４、命令スケジューリン
グ部１６、アセンブリコード変換部２０及びアセンブリ
コード出力部２２の処理機能に対応する。FIG. 7 is a flowchart of the overall processing operation of the assembly apparatus according to the present invention of FIG. 2 executed by a processor having the software bypass control mechanism shown in FIG. Steps S1 to S in this flowchart
5 corresponds to the processing functions of the source program input unit 12, the intermediate code conversion unit 14, the instruction scheduling unit 16, the assembly code conversion unit 20, and the assembly code output unit 22 in the functional blocks in FIG.

【００４７】図８は図７のステップＳ３に示した中間コ
ードの命令スケジューリングの詳細であり、中間コード
から制御フローグラフを作成し、命令スケジューリング
を行うスケジューリング対象ブロックの集合の生成処理
を行っている。図８において、本発明の命令スケジュー
リングにあっては、まずステップＳ１で中間コードに変
換されたソースプログラムを対象に中間コードの制御フ
ローグラフを作成する。図９は図８のステップＳ１で作
成される制御フローグラフの一例であり、制御フローグ
ラフは分岐部分から合流部分までを１つの基本ブロック
として作成された有効循環グラフである。FIG. 8 shows details of the instruction scheduling of the intermediate code shown in step S3 of FIG. 7. A control flow graph is created from the intermediate code, and a process of generating a set of blocks to be scheduled for instruction scheduling is performed. . In FIG. 8, in the instruction scheduling of the present invention, a control flow graph of the intermediate code is created for the source program converted into the intermediate code in step S1. FIG. 9 is an example of the control flow graph created in step S1 of FIG. 8, and the control flow graph is an effective circulation graph created from the branch portion to the merge portion as one basic block.

【００４８】図９にあっては、基本ブロックＡ〜Ｊの１
０ブロックで構成されている。各ブロックの後続する基
本ブロックへの分岐ラインはエッジと呼ばれる。例えば
基本ブロックＡはエッジａ１，ａ２を有する。このエッ
ジのそれぞれには制御フローグラフの実行回数が表示さ
れる。例えば基本ブロックＡのエッジａ１の実行回数は
１０００回、エッジａ２の実行回数も１０００回となっ
ている。再び図８を参照するに、ステップＳ１で図９の
ような制御フローグラフが作成されたならば、ステップ
Ｓ２に進み、まだスケジューリングを行っていない基本
ブロックが存在するか否かチェックし、スケジューリン
グを行っていない基本ブロックが存在すれば、ステップ
Ｓ３〜Ｓ８の処理を繰り返す。In FIG. 9, one of the basic blocks A to J
It is composed of 0 blocks. The branch line of each block to the subsequent basic block is called an edge. For example, the basic block A has edges a1 and a2. The number of executions of the control flow graph is displayed on each of the edges. For example, the number of executions of the edge a1 of the basic block A is 1000 times, and the number of executions of the edge a2 is also 1000 times. Referring again to FIG. 8, if the control flow graph as shown in FIG. 9 is created in step S1, the process proceeds to step S2, where it is checked whether or not there is a basic block that has not yet been scheduled. If there is a basic block that has not been performed, the processing of steps S3 to S8 is repeated.

【００４９】ステップＳ３〜Ｓ５にあっては、スケジュ
ーリングを行うための基本ブロックの集合としてスケジ
ューリング対象集合Ｔを生成する。即ち、通常のプログ
ラムにあっては、図９の制御フローグラフのように実行
時にエッジの実行回数に偏りが存在する。本発明が対象
とするソフトウェア・バイパス制御機構を備えたプロセ
ッサでは、実行回数の多いエッジのソースとディストネ
ーションの基本ブロックの間で効率良くバイパスを使っ
てＶＬＩＷ命令を生成することが望ましい。In steps S3 to S5, a scheduling target set T is generated as a set of basic blocks for performing scheduling. That is, in a normal program, there is a bias in the number of executions of the edge at the time of execution as shown in the control flow graph of FIG. In a processor having a software bypass control mechanism according to the present invention, it is desirable to efficiently generate a VLIW instruction using a bypass between a source block of an edge that is frequently executed and a basic block of a destination.

【００５０】このため、プロファイル情報を利用して図
９のように各エッジごとに実行回数を求め、実行回数の
多いエッジのバイパス使用を効率良くするため、図８の
ステップＳ３〜Ｓ６の処理を繰り返す。まずステップＳ
３にあっては、まだスケジューリングしていない基本ブ
ロックの集合から実行回数の多いエッジをもつトレース
Ｔを選択する。For this reason, as shown in FIG. 9, the number of executions is obtained for each edge by using the profile information, and the processing of steps S3 to S6 in FIG. repeat. First, step S
In No. 3, a trace T having an edge with a large number of executions is selected from a set of basic blocks not yet scheduled.

【００５１】図９の制御フローグラフにあっては、例え
ば基本ブロックＣのエッジｃが５１０００回、基本ブロ
ックＤのエッジｄ２が同じく５１０００回と実行回数が
最も大きいことから、基本ブロックＣ，Ｄ，Ｆを結ぶト
レースＴを選択する。これをトレースＴ（Ｃ，Ｄ，Ｆ）
と表す。次にステップＳ４でトレースＴ（Ｃ，Ｄ，Ｆ）
を選択した状態で、まだスケジューリングしていない基
本ブロックでトレースＴ（Ｃ，Ｄ，Ｆ）に含まれず、且
つ後続する基本ブロックがトレースＴ（Ｃ，Ｄ，Ｆ）に
含まれる先行する基本ブロックの集合Ｐを求める。この
ような集合Ｐとしては、図９のトレースＴ（Ｃ，Ｄ，
Ｆ）に対し基本ブロックＡと基本ブロックＢがあり、集
合Ｐは集合Ｐ（Ａ，Ｂ）と表される。この集合Ｐは別の
表現で言うと、ステップＳ３で求めたトレースＴ（Ｃ，
Ｄ，Ｆ）に合流するエッジａ２，ｂをもち、且つスケジ
ューリングが終了していない基本ブロックＡ，Ｂと言う
こともできる。In the control flow graph of FIG. 9, for example, the edge block c of the basic block C is 51,000 times, and the edge d2 of the basic block D is 51000 times. A trace T connecting F is selected. Trace T (C, D, F)
It expresses. Next, in step S4, trace T (C, D, F)
Is selected, the basic block not yet scheduled is not included in the trace T (C, D, F), and the succeeding basic block is included in the preceding basic block included in the trace T (C, D, F). Find the set P. As such a set P, the trace T (C, D,
F) has a basic block A and a basic block B, and the set P is represented as a set P (A, B). In other words, the set P is a trace T (C, C,
D, F) can also be referred to as basic blocks A and B having edges a2 and b merging with each other and having not yet completed scheduling.

【００５２】ステップＳ４で集合Ｐ（Ａ，Ｂ）が求めら
れたならば、ステップＳ５に進み、ステップＳ３で求め
たトレースＴ（Ｃ，Ｄ，Ｆ）にステップＳ４で求めた集
合Ｐ（Ａ，Ｂ）を加算してスケジューリング対象集合Ｔ
´する。この場合、スケジューリング対象集合ＴはＴ´
（Ｃ，Ｄ，Ｆ，Ａ，Ｂ）となる。続いてステップＳ６
で、ステップＳ５の集合Ｔにまだスケジューリングして
いない基本ブロックが存在するか否かチェックし、基本
ブロックが存在しなくなるまでステップＳ６，Ｓ７，Ｓ
８からの処理を繰り返す。スケジューリング対象集合Ｔ
´に対するステップＳ６，Ｓ７，Ｓ８に従った基本ブロ
ックのスケジューリングが全て終了したならば、ステッ
プＳ２に進み、ステップＳ６でスケジューリングを実行
する基本ブロックを見つける。When the set P (A, B) is obtained in step S4, the process proceeds to step S5, where the set P (A, B) obtained in step S4 is added to the trace T (C, D, F) obtained in step S3. B) is added and the scheduling target set T
´ In this case, the scheduling target set T is T ′
(C, D, F, A, B). Then, step S6
Then, it is checked whether or not there is a basic block which has not been scheduled yet in the set T in step S5, and steps S6, S7, S are performed until the basic block no longer exists.
The processing from step 8 is repeated. Scheduling target set T
After all the basic block scheduling according to steps S6, S7, and S8 for 'is completed, the process proceeds to step S2, and a basic block for which scheduling is to be executed is found in step S6.

【００５３】このステップＳ７の基本ブロックの検索
は、図９のように、制御フローグラフは一般に有効循環
グラフであるため、合流点の基本ブロックの命令スケジ
ューリングを行う時に先行基本ブロックの資源予約表を
参照できない場合がある。このように、合流点の基本ブ
ロックの命令スケジューリングで先行基本ブロックの資
源予約表を参照できない場合を減らすため、ステップＳ
３〜Ｓ６で求めたスケジューリング対象集合Ｔに含まれ
る複数の基本ブロックについて、ステップＳ７は次の優
先順位に従ってスケジューリング対象とする基本ブロッ
クを見つける。In the basic block search in step S7, as shown in FIG. 9, since the control flow graph is generally an effective circulation graph, the resource reservation table of the preceding basic block is used when the instruction scheduling of the basic block at the junction is performed. You may not be able to reference. As described above, in order to reduce the case where the resource reservation table of the preceding basic block cannot be referred to in the instruction scheduling of the basic block at the junction, step S
For a plurality of basic blocks included in the scheduling target set T obtained in 3 to S6, step S7 finds a basic block to be scheduled according to the following priority order.

【００５４】優先順位：先行基本ブロックが存在しな
い基本ブロック優先順位：全ての先行基本ブロックのスケジューリン
グが終了しているが、自己のスケジューリングが終了し
ていない基本ブロック優先順位：先行基本ブロックにスケジューリングが終
了した基本ブロックをもち、且つスケジューリングが終
了していない先行基本ブロックの数が最も少ない基本ブ
ロック図９について具体的に説明すると、例えばスケジューリ
ング対象集合ＴとしてＴ（Ａ，Ｂ，Ｃ，Ｄ，Ｅ，Ｆ）が
生成されていたとすると、優先順位に該当する先行基
本ブロックが存在しない基本ブロックは基本ブロックＡ
であり、まず基本ブロックＡが選択されてスケジューリ
ングが行われる。次に選択される基本ブロックは、優先
順位で示される全ての先行基本ブロックのスケジュー
リングは終了しているが自己のスケジューリングが終了
していない基本ブロックとなる基本ブロックＢが選択さ
れる。尚、基本ブロックＣは、スケジューリングが終了
していない先行基本ブロックＦをもつため、選択されな
い。Priority: a basic block having no preceding basic block. Priority: a basic block for which scheduling of all preceding basic blocks has been completed but its own scheduling has not been completed. Priority: scheduling has been performed for the preceding basic block. Basic blocks having completed basic blocks and having the least number of preceding basic blocks for which scheduling has not been completed. To be more specific, referring to FIG. 9, for example, as a scheduling target set T, T (A, B, C, D, E , F) are generated, the basic block having no preceding basic block corresponding to the priority order is the basic block A.
First, the basic block A is selected and scheduling is performed. As a basic block to be selected next, a basic block B is selected, which is a basic block in which scheduling of all preceding basic blocks indicated by the priorities has been completed but its own scheduling has not been completed. Note that the basic block C is not selected because it has a preceding basic block F for which scheduling has not been completed.

【００５５】優先順位は図９の制御フローグラフの分
岐点と合流点の間の複数系統のトレースに直接的に複数
の基本ブロックが存在する場合であり、並列に存在する
トレースの内、スケジューリングが終了していない先行
基本ブロックの数が最も少ないトレースの基本ブロック
を見つけてスケジューリングを行うことになる。このよ
うな図８のステップＳ７の命令スケジューリング即ち前
述した優先順位，，の順序で基本ブロックの命令
スケジューリングを行うことで、先行基本ブロックの資
源予約表を参照できない場合を低減することができる。The priority order is a case where a plurality of basic blocks are directly present in a plurality of traces between a branch point and a junction in the control flow graph of FIG. 9, and among the traces existing in parallel, the scheduling is not performed. The scheduling is performed by finding the basic block of the trace with the least number of unfinished preceding basic blocks. By performing the instruction scheduling of the basic block in the order of the instruction scheduling in step S7 of FIG. 8, that is, the above-described priority order, it is possible to reduce the case where the resource reservation table of the preceding basic block cannot be referred to.

【００５６】更に図７の処理にあっては、図９の制御フ
ローグラフの合流点の最適化のため、制御フローグラフ
の合流点の基本ブロックの状態をケース先行基本ブロックの命令スケジューリングが全て終了し
ている場合、ケース命令スケジューリングが終了していない先行基本ブロッ
クが存在する場合、の２つに分けて処理する。Further, in the processing of FIG. 7, the state of the basic block at the junction of the control flow graph is optimized in order to optimize the junction of the control flow graph of FIG. If there is a preceding basic block for which instruction scheduling has not been completed, the processing is divided into the following two cases.

【００５７】まず先行基本ブロックの命令スケジューリ
ングが全て終了しているケースの場合には、先行基本
ブロックの資源予約表を参照し、合流点の基本ブロック
のスケジューリングを行う。例えば図９の合流点の基本
ブロックＤにあっては、先行基本ブロックＡ，Ｂ，Ｃの
命令スケジューリングが全て終了している場合に合流点
の基本ブロックＤのスケジューリングを先行基本ブロッ
クの資源予約表を参照して行う。First, in the case where all the instruction scheduling of the preceding basic block has been completed, the scheduling of the basic block at the junction is performed with reference to the resource reservation table of the preceding basic block. For example, in the basic block D at the junction shown in FIG. 9, when the instruction scheduling of the preceding basic blocks A, B, and C has all been completed, the scheduling of the basic block D at the junction is performed by using the resource reservation table of the preceding basic block. To do.

【００５８】この場合、合流点の基本ブロックＤに複数
の先行基本ブロックＢ，Ｃが定義するレジスタの値を使
用するオペレーションが存在する場合には、可能ならば
先行基本ブロックＢまたはＣの空きスロットにレジスタ
間ムーブ命令を生成することで、基本ブロックＤの合流
点の入口でバイパスのタイミングを合わせる。図１０は
図９の合流点の基本ブロックＤに対する先行する２つの
基本ブロックＢ，Ｃの命令スロットを示しており、合流
点の基本ブロックＤの入口のスロット６４にあっては、
先行基本ブロックＢのスロット６０のソースオペランド
のレジスタ値と先行基本ブロックＣのスロット６２の同
じくソースオペランド値％ｒ１のレジスタ値を使用して
おり、基本ブロックＢの最終スロットは空きスロット６
１となって、基本ブロックＣのスロット６２に対しタイ
ミングがずれている。In this case, if there is an operation in the basic block D at the merging point using the values of the registers defined by the plurality of preceding basic blocks B and C, if possible, an empty slot of the preceding basic block B or C is used. , The bypass timing is adjusted at the entrance of the junction of the basic block D. FIG. 10 shows instruction slots of two preceding basic blocks B and C with respect to the basic block D at the junction in FIG. 9. In the slot 64 at the entrance of the basic block D at the junction,
The register value of the source operand of the slot 60 of the preceding basic block B and the register value of the source operand value% r1 of the slot 62 of the preceding basic block C are used.
As a result, the timing is shifted from the slot 62 of the basic block C.

【００５９】このような場合には図１１のように、基本
ブロックＢの空きスロット６６にレジスタ間ムーブ命令
を生成することで、合流点の基本ブロックＤの入口での
バイパスのタイミングを合わせる。この場合、空きスロ
ットに対するレジスタ間ムーブ命令の生成でバイパスの
タイミングを合わせることができない場合には、ソース
オペランドの値をレジスタから読むようにスケジューリ
ングを行う。In such a case, as shown in FIG. 11, by generating an inter-register move instruction in the empty slot 66 of the basic block B, the bypass timing at the entrance of the basic block D at the junction is adjusted. In this case, if the timing of the bypass cannot be adjusted by generating an inter-register move instruction for an empty slot, scheduling is performed so that the value of the source operand is read from the register.

【００６０】次にケースの合流点の基本ブロックに対
し命令スケジューリングが終了していない先行基本ブロ
ックが存在する場合の処理を説明する。例えば図９の制
御フローグラフにおいて、合流点の基本ブロックＤに対
し命令スケジューリングが終了していない先行基本ブロ
ックＣが存在する場合、命令スケジューリングが終了し
ていない先行基本ブロックＣ以前の命令によるレジスタ
への書込みが完了した後にそのレジスタの値を読むこと
を保証する必要がある。Next, a description will be given of the processing in the case where there is a preceding basic block for which instruction scheduling has not been completed for the basic block at the junction of cases. For example, in the control flow graph of FIG. 9, if there is a preceding basic block C for which the instruction scheduling has not been completed for the basic block D at the junction, a register by an instruction before the preceding basic block C for which the instruction scheduling has not been completed is stored. It is necessary to guarantee that the value of the register is read after the writing of the register is completed.

【００６１】このため、命令スケジューリングが終了し
ていない先行基本ブロックＣ以前の基本ブロックが定義
するソースオペランドを使用するオペレーションは、基
本ブロックＣ以前の基本ブロックによるレジスタへの書
込みが完了するサイクル以降に実行されるように合流点
の基本ブロックＤをスケジューリングする。それ以外の
レジスタへの書込みに関しては、ケースの先行基本ブ
ロックの命令スケジューリングが既に終了している場合
と同様の方法でバイパスのタイミング調整を行う。Therefore, the operation using the source operand defined by the basic block before the preceding basic block C in which the instruction scheduling is not completed is performed after the cycle in which the writing to the register by the basic block before the basic block C is completed. The basic block D at the junction is scheduled to be executed. For writing to the other registers, the timing of bypass is adjusted in the same manner as in the case where the instruction scheduling of the preceding basic block in the case has already been completed.

【００６２】次に図９のステップＳ８における基本ブロ
ックの１つ１つを対象として拡張資源予約表を作成し、
拡張資源予約表のオペレーション番号とバイパス情報か
らＶＬＩＷ命令を生成する処理を説明する。まず拡張資
源予約表の生成を説明する。今、一例として図１２のよ
うなオペレーション番号ＯＰ１，ＯＰ２，ＯＰ３で示さ
れるスロットＢの命令列の命令スケジューリングを考え
る。スロットＢの命令列の内容は、図２４（Ｂ）に示し
た命令内容と同じである。この図１２のスロットＢの命
令列を図３のソフトウェア・バイパス制御機構付のプロ
セッサで実行した場合のパイプラインサイクルは、図１
３のようになる。Next, an extended resource reservation table is created for each of the basic blocks in step S8 of FIG.
Processing for generating a VLIW instruction from the operation number and bypass information of the extended resource reservation table will be described. First, generation of the extended resource reservation table will be described. Now, as an example, consider the instruction scheduling of the instruction sequence of the slot B indicated by the operation numbers OP1, OP2, and OP3 as shown in FIG. The content of the instruction sequence in slot B is the same as the instruction content shown in FIG. A pipeline cycle when the instruction sequence in slot B of FIG. 12 is executed by the processor with the software bypass control mechanism of FIG.
It looks like 3.

【００６３】即ち、ＶＬＩＷ命令１のスロットＢについ
ては、サイクル１，サイクル２，サイクル３，サイクル
４でパイプラインステージＤ，Ｅ，Ｎ，Ｗと進み、Ｅ，
Ｎ，Ｗのそれぞれでバイパス出力が行われている。また
ＶＬＩＷ命令２のスロットＢについては、サイクル２，
３，４，５でパイプラインステージＤ，Ｅ，Ｎ，Ｗとな
る。更にオペレーション番号ＯＰ３のＶＬＩＷ命令３に
あっては、サイクル３，４，５，６でパイプラインステ
ージＤ，Ｅ，Ｎ，Ｗとなる。このパプラインの流れを、
バイパス出力をソースオペランドとして選択する場合を
含む命令実行時のパイプラインの流れとして表すと、図
１４のようになる。That is, for slot B of the VLIW instruction 1, in cycle 1, cycle 2, cycle 3, and cycle 4, the process proceeds to pipeline stages D, E, N, and W, and E,
The bypass output is performed in each of N and W. For slot B of VLIW instruction 2, cycle 2,
The pipeline stages D, E, N, and W are performed at 3, 4, and 5, respectively. Further, in the VLIW instruction 3 of the operation number OP3, the pipeline stages D, E, N, and W are set in cycles 3, 4, 5, and 6, respectively. The flow of this pipeline,
FIG. 14 shows a pipeline flow when executing an instruction including a case where a bypass output is selected as a source operand.

【００６４】図１４において、まずサイクルｎ＋０にあ
っては、スロットＢのＤステージにＶＬＩＷ命令１を投
入して解読し、次のサイクルｎ＋１のＥステージでバイ
パスＥに演算結果としてオペランドの値％ｒ２が出力さ
れる。このサイクルｎ＋１にあっては、同時にＶＬＩＷ
命令２がＤステージに投入され、ＶＬＩＷ命令２の演算
ではバイパスＥに出力されたオペランド値％ｒ２を選択
している。In FIG. 14, first, in the cycle n + 0, the VLIW instruction 1 is input to the D stage of the slot B and decoded, and in the next stage E of the cycle n + 1, the operand value% r2 is output to the bypass E as an operation result. Is output. In this cycle n + 1, at the same time, VLIW
The instruction 2 is input to the D stage, and the operation of the VLIW instruction 2 selects the operand value% r2 output to the bypass E.

【００６５】次のサイクルｎ＋２にあっては、ＶＬＩＷ
命令１はＮステージの演算となってバイパスＮに演算結
果を出力している。同時にサイクルｎ＋２にあっては、
ＶＬＩＷ命令２の演算結果がバイパスＥにオペランド％
ｒ３として出力される。更にサイクルｎ＋２にあって
は、ＶＬＩＷ命令３がＤステージに投入されており、Ｖ
ＬＩＷ命令１のバイパスＮから出力したオペランド％ｒ
２とＶＬＩＷ命令２がバイパスＥに出力したオペランド
％ｒ３を選択して読み込んでいる。In the next cycle n + 2, VLIW
Instruction 1 outputs an operation result to bypass N as an N-stage operation. At the same time, in cycle n + 2,
The operation result of VLIW instruction 2 is operand% in bypass E
Output as r3. Further, in the cycle n + 2, the VLIW instruction 3 is input to the D stage, and
Operand% r output from bypass N of LIW instruction 1
2 and the VLIW instruction 2 select and read the operand% r3 output to the bypass E.

【００６６】この図１４のようなスロットＢにおけるＶ
ＬＩＷ命令１，２，３におけるバイパスＥ，Ｎからのソ
ースオペランドの値の読込みを陽に指定するための本発
明のコンパイラによるアセンブリコードは、図１５
（Ａ）のようになる。この図１５（Ａ）のソースオペラ
ンドの値をバイパスあるいはレジスタから読み込むか否
かを陽に指定したアセンブリコードの内容は、図１５
（Ｂ）のようになる。V in slot B as shown in FIG.
The assembly code by the compiler of the present invention for explicitly specifying the reading of the value of the source operand from the bypasses E and N in the LIW instructions 1, 2, and 3 is shown in FIG.
(A). The contents of the assembly code explicitly specifying whether to bypass or read the value of the source operand from the register in FIG.
(B).

【００６７】まずＶＬＩＷ命令のスロットＢは、レジス
タＲのソースオペランド％ｒ０と％ｒ１を加算して％ｒ
２とする命令である。この場合には、バイパスからのソ
ースオペランドの読込みはない。次のＶＬＩＷ命令２に
あっては、ＶＬＩＷ命令１がバイパスＥに出力したソー
スオペランド％ｒ０を読み込んで測値Ａを加算して％ｒ
３とするものである。次のＶＬＩＷ命令２にあっては、
ＶＬＩＷ命令１の演算で出力したバイパスＮのソースオ
ペランド％ｒ２とＶＬＩＷ命令２の演算で出力したバイ
パスＥのソースオペランド％ｒ２を読み込んで加算して
％ｒ２とするものである。First, the slot B of the VLIW instruction is obtained by adding the source operands% r0 and% r1 of the register R to% r0.
It is an instruction to be 2. In this case, there is no reading of the source operand from the bypass. In the next VLIW instruction 2, the source operand% r0 output to the bypass E by the VLIW instruction 1 is read, and the measured value A is added.
3. In the next VLIW instruction 2,
The source operand% r2 of the bypass N output by the operation of the VLIW instruction 1 and the source operand% r2 of the bypass E output by the operation of the VLIW instruction 2 are read and added to obtain% r2.

【００６８】このような図１５（Ａ）のバイパス選択情
報を含むアセンブリコードを生成するため、図１６、図
１７、図１８の順番に処理して最終的に図１９のような
拡張資源予約表を基本ブロックごとに生成する。図１９
の拡張資源予約表は、第１リスト１８Ａと第２リスト１
８Ｂで構成される。この拡張資源予約表には、以下の手
順で第２リスト１８Ｂの命令の登録と第１リスト１８Ａ
の出力部分の更新を行う。In order to generate such an assembly code including the bypass selection information of FIG. 15A, the assembly code is processed in the order of FIGS. 16, 17 and 18, and finally the extended resource reservation table as shown in FIG. Is generated for each basic block. FIG.
Are the first list 18A and the second list 1
8B. In this extended resource reservation table, the instructions of the second list 18B are registered and the first list 18A is registered in the following procedure.
Update the output part of.

【００６９】サイクルＸで命令ｉがスロットＳを使用
することができ、且つ命令が使用するオペランドが出力
リストとなる第１リスト１９ＡのサイクルＸに存在する
ならば、命令ｉを第２リスト１８ＢのサイクルＸでスロ
ットＳに登録する。命令ｉをサイクルＸのスロットＳ上で実行することに
よる出力を第１リストに記述する。If the instruction i can use the slot S in the cycle X, and the operand used by the instruction exists in the cycle X of the first list 19A serving as the output list, the instruction i is stored in the second list 18B. Register in slot S in cycle X. The output of executing instruction i on slot S of cycle X is described in the first list.

【００７０】この張資源予約表の作成を具体的に説明す
ると次のようになる。図１６はスケジュール開始前の拡
張資源予約表の第１リスト１８Ａと第２リスト１８Ｂで
あり、この場合、オペレーション番号は不定であり、ま
たバイパス情報Ｅ，Ｎ，Ｗはなく、オペランド％ｒ０，
％ｒ１はレジスタ出力Ｒであることから、第１リスト１
８Ａに出力（？，The creation of the Zhang resource reservation table is specifically described as follows. FIG. 16 shows the first list 18A and the second list 18B of the extended resource reservation table before the start of the schedule. In this case, the operation number is undefined, there is no bypass information E, N, W, and the operands% r0,
Since% r1 is the register output R, the first list 1
Output to 8A (?,

【００７１】Ｒ）、（？，R), (?,

【００７２】Ｒ）が記述されている。図１７はサイクル
ｎ＋０における拡張資源予約表に対するＶＬＩＷ命令１
の登録と出力部分の更新であり、オペレーションＯＰ１
にあっては、スロットＢがサイクルｎ＋０で空いてい
て、且つソースオペランド％ｒ０と％ｒ１がレジスタか
ら読むことができるので、サイクル０のスロットＢにＶ
ＬＩＷ命令１を登録することができる。この場合、ソー
スオペランド％ｒ０，％ｒ１はバイパスからは読まない
ので、バイパス情報の付加はない。R) is described. FIG. 17 shows a VLIW instruction 1 for the extended resource reservation table in cycle n + 0.
And updating of the output part, and the operation OP1
Since slot B is empty at cycle n + 0 and source operands% r0 and% r1 can be read from the register, V
LIW instruction 1 can be registered. In this case, since the source operands% r0 and% r1 are not read from the bypass, no bypass information is added.

【００７３】続いて図１５のＶＬＩＷ命令１をサイクル
ｎ＋０のスロットＢ上で実行することによる出力を第１
リスト１８Ａに記述する。この場合には、オペレーショ
ン番号１と演算結果が出力されるバイパス情報Ｅ，Ｎ，
Ｗ、及び最終的なレジスタ書込みによるレジスタ出力Ｒ
の組を（１，Subsequently, the output obtained by executing VLIW instruction 1 of FIG.
This is described in List 18A. In this case, the operation number 1 and the bypass information E, N,
W and register output R by final register write
Set of (1,

【００７４】Ｅ）（１，E) (1,

【００７５】Ｎ）（１，N) (1,

【００７６】Ｗ）（１，W) (1,

【００７７】Ｒ）のように登録する。図１８は続いて行
うＶＬＩＷ命令２の拡張資源予約表の登録である。この
オペレーション番号ＯＰ２にあっては、スロットＢがサ
イクルｎ＋１で空いていて、第１リスト１８Ａから明ら
かなようにソースオペランド％ｒ２はバイパスＥから読
み込むことができるので、サイクルｎ＋１のスロットＢ
にスケジューリングできる。このオペレーション番号２
の第２リスト１８Ｂの登録時には、オペレーション番号
ＯＰ２と共に、使用したバイパス情報（％ｒ２，Register as shown in R). FIG. 18 shows the registration of the extended resource reservation table of the VLIW instruction 2 to be performed subsequently. In this operation number OP2, slot B is vacant in cycle n + 1, and source operand% r2 can be read from bypass E as apparent from first list 18A.
Can be scheduled. This operation number 2
When the second list 18B is registered, the bypass information (% r2,

【００７８】Ｅ）と一緒に保存し、アセンブリコード生
成時に使用する。続いて図１５のＶＬＩＷ命令１をサイ
クルｎ＋１のスロットＢ上で実行することによる出力を
第１リスト１８Ａに記述する。この場合には、オペレー
ション番号２と演算結果が出力されるバイパス情報Ｅ，
Ｎ，Ｗ、及び最終的なレジスタ書込みによるレジスタ出
力Ｒの組を（２，Stored together with E) and used when generating assembly code. Subsequently, an output obtained by executing the VLIW instruction 1 in FIG. 15 on the slot B in the cycle n + 1 is described in the first list 18A. In this case, the operation number 2 and the bypass information E, which outputs the operation result,
The set of N, W, and the register output R by the final register write is (2,

【００７９】Ｅ）（２，E) (2,

【００８０】Ｎ）（２，N) (2,

【００８１】Ｗ）（２，W) (2,

【００８２】Ｒ）のように登録する。図１９は図１４の
ＶＬＩＷ命令３のオペレーションにおける拡張資源予約
表の登録であり、オペレーション番号ＯＰ３にあって
は、スロットＢがサイクルｎ＋２で空いており、且つ第
１リスト１８Ａから明らかなように、ソースオペランド
％ｒ２と％ｒ３をそれぞれバイパスＮとバイパスＥから
読むことができるので、第２リスト１８Ｂのサイクルｎ
＋２のスロットＢにスケジューリングできる。Register as shown in R). FIG. 19 shows the registration of the extended resource reservation table in the operation of the VLIW instruction 3 in FIG. 14. In the operation number OP3, the slot B is vacant in the cycle n + 2, and as is clear from the first list 18A, Since the source operands% r2 and% r3 can be read from the bypass N and the bypass E, respectively, the cycle n of the second list 18B
+2 slot B can be scheduled.

【００８３】続いて図１５のＶＬＩＷ命令３をサイクル
ｎ＋２のスロットＢ上で実行することによる出力を第１
リスト１８Ａに記述する。この場合には、オペレーショ
ン番号３と演算結果が出力されるバイパス情報Ｅ，Ｎ，
Ｗ、及び最終的なレジスタ書込みによるレジスタ出力Ｒ
の組を（３，Subsequently, the output obtained by executing VLIW instruction 3 in FIG.
This is described in List 18A. In this case, the operation number 3 and the bypass information E, N,
W and register output R by final register write
Set of (3,

【００８４】Ｅ）（３，E) (3,

【００８５】Ｎ）（３，N) (3,

【００８６】Ｗ）（３，W) (3,

【００８７】Ｒ）のように登録する。このようにオペレ
ーション番号ＯＰ３を拡張資源予約表の第２リスト１８
Ｂに登録する際に、オペレーション番号ＯＰ３と使用し
たバイパス情報（％ｒ２Registration is performed as shown in R). As described above, the operation number OP3 is stored in the second list 18 of the extended resource reservation table.
B, the operation number OP3 and the bypass information (% r2

【００８８】Ｎ，％ｒ３N,% r3

【００８９】Ｅ）を一緒に保存し、アセンブリコード生
成時に使用する。この図１９のように作成した拡張資源
予約表から、結果として図１５（Ａ）のＶＬＩＷ命令列
を生成することができる。図２０は図１６〜図１９に示
した拡張資源予約表の生成と、これに基づくＶＬＩＷ命
令の生成処理であり、図９のステップＳ８の基本ブロッ
クのスケジューリングの詳細となる。E) is saved together and used when generating assembly code. From the extended resource reservation table created as shown in FIG. 19, the VLIW instruction sequence shown in FIG. 15A can be generated as a result. FIG. 20 shows the generation of the extended resource reservation table shown in FIGS. 16 to 19 and the generation processing of the VLIW instruction based on the table, and details the basic block scheduling in step S8 in FIG.

【００９０】図２０において、まずステップＳ１でスケ
ジューリング対象となっている基本ブロック内の命令の
命令依存グラフを作成する。この命令依存グラフは、命
令単位の実行手順を決めた有効グラフである。続いて、
サイクル数を示すＣＬＯＣＫを０にステップＳ２でクリ
アした後、ステップＳ３で基本ブロック内の全ての命令
を拡張資源予約表に登録したか否かチェックしている。In FIG. 20, first, at step S1, an instruction dependence graph of instructions in the basic block to be scheduled is created. This instruction dependence graph is an effective graph that determines an execution procedure for each instruction. continue,
After clearing CLOCK indicating the number of cycles to 0 in step S2, it is checked in step S3 whether all the instructions in the basic block have been registered in the extended resource reservation table.

【００９１】未登録がある場合には、ステップＳ４に進
み、登録していない命令で時刻ＣＬＯＣＫで示すサイク
ルに拡張資源予約表に登録可能な命令ｉが存在するか否
かチェックする。このような命令ｉが存在すると、ステ
ップＳ５で拡張資源予約表の時刻ＣＬＯＣＫで指定され
るサイクルの該当するスロットに命令ｉをオペレーショ
ン番号によって登録し、この場合、ソースオペランドが
バイパスから選択されている場合には、バイパス情報も
同時に登録する。If unregistered, the flow advances to step S4 to check whether there is an unregistered instruction i which can be registered in the extended resource reservation table in the cycle indicated by the time CLOCK. If such an instruction i exists, the instruction i is registered by the operation number in the corresponding slot of the cycle specified by the time CLOCK in the extended resource reservation table in step S5. In this case, the source operand is selected from bypass. In this case, the bypass information is also registered at the same time.

【００９２】ステップＳ４で現在時刻のＣＬＯＣＫの拡
張資源予約表に登録可能な命令ｉが存在しなくなった場
合には、ステップＳ６に進み、時刻ＣＬＯＣＫを１つア
ップし、ステップＳ３に戻って次の時刻ＣＬＯＣＫのサ
イクルでの拡張資源予約表の登録を行う。ステップＳ３
で基本ブロック内の全ての命令の拡張資源予約表に対す
る登録が終了すると、ステップＳ７に進み、登録が済ん
だ拡張資源予約表のオペレーション番号と、同時に登録
されているバイパス情報から、図１５（Ａ）のようなＶ
ＬＩＷ命令を生成する。If there is no longer any instruction i that can be registered in the extended resource reservation table of the CLOCK at the current time in step S4, the process proceeds to step S6, the time CLOCK is incremented by one, and the process returns to step S3 to return to the next step. The extended resource reservation table is registered in the cycle of the time CLOCK. Step S3
When the registration of all the instructions in the basic block in the extended resource reservation table is completed, the process proceeds to step S7, where the operation number of the registered extended resource reservation table and the bypass information registered at the same time are used as shown in FIG. V like
Generate an LIW instruction.

【００９３】図２２は、本発明のコンパイル実行プログ
ラムを記録したコンピュータ読み取り可能な記録媒体の
実施形態であり、この記録媒体には、ＣＤ−ＲＯＭやフ
ロッピーディスク等のリムーバブルな可搬型記憶媒体、
回線によりプログラムを提供するプログラム提供者の記
憶装置、更にプログラムをインストールした処理装置の
ＲＡＭやハードディスク等のメモリ装置がある。また記
録媒体によって提供されたプログラムは、処理装置にロ
ーディングされ、その主メモリ上で実行される。FIG. 22 shows an embodiment of a computer-readable recording medium on which the compile execution program of the present invention is recorded. The recording medium includes a removable portable storage medium such as a CD-ROM or a floppy disk,
There is a storage device of a program provider that provides a program via a line, and a memory device such as a RAM or a hard disk of a processing device in which the program is installed. The program provided by the recording medium is loaded into the processing device and executed on the main memory.

【００９４】尚、上記の実施形態は、図３のソフトウェ
ア・バイパス制御機構を備えたプロセッサにおけるスロ
ットＢの命令を例にとっているが、残りのスロットＡ，
Ｃ，Ｄについても全く同様に適用できる。また上記の実
施形態は、並列ＶＬＩＷ命令を実行する並列パイプライ
ン構成のソフトウェア・バイパス制御機構付きのプロセ
ッサを対象としたコンパイルを説明したが、実施形態で
詳細に示したスロットＢのように、単一のパイプライン
をもったソフトウェア・バイパス制御機構付きのプロセ
ッサを対象としたコンパイルについても、そのまま適用
できる。In the above embodiment, the instruction of the slot B in the processor having the software bypass control mechanism shown in FIG. 3 is taken as an example.
The same applies to C and D. Further, in the above-described embodiment, the compiling for the processor having the software bypass control mechanism of the parallel pipeline configuration for executing the parallel VLIW instruction has been described. Compiling for a processor having a software bypass control mechanism having one pipeline can be applied as it is.

【００９５】更に、上記の実施形態は、ＶＬＩＷ命令の
コンパイルを例にとっているが、命令の形態による限定
は受けない。Further, in the above-described embodiment, the compilation of the VLIW instruction is taken as an example, but the present invention is not limited by the form of the instruction.

【００９６】[0096]

【発明の効果】以上説明してきたように本発明によれ
ば、ソフトウェア・バイパス制御機構をもつプロセッサ
に対し、オペレーション番号にバイパス情報を付加した
拡張資源予約表を使用することで、ソースオペランドの
値としてバイパスを考慮した命令スケジューリングを行
うことができる。As described above, according to the present invention, by using an extended resource reservation table in which bypass information is added to an operation number for a processor having a software bypass control mechanism, the value of a source operand As a result, instruction scheduling in consideration of bypass can be performed.

【００９７】また本発明によれば、ソフトウェア・バイ
パス制御機構をもつプロセッサに特有の問題であるコン
パイルの際に生成される制御フローグラフの合流点での
バイパスの使用を、制御フローグラフの基本ブロックの
スケジューリングの順序を決定することによって効率良
く行うことができる。According to the present invention, the use of the bypass at the junction of the control flow graph generated at the time of compilation, which is a problem specific to a processor having a software bypass control mechanism, is described in the basic block of the control flow graph. Can be efficiently performed by determining the scheduling order.

[Brief description of the drawings]

【図１】本発明の原理説明図FIG. 1 is a diagram illustrating the principle of the present invention.

【図２】本発明によるコンパイル装置の機能ブロック図FIG. 2 is a functional block diagram of a compiling device according to the present invention.

【図３】本発明が対象とするソフトウェア・バイパス制
御機構を備えたプロセッサのブロック図FIG. 3 is a block diagram of a processor having a software bypass control mechanism to which the present invention is directed;

【図４】図３のプロセッサによるパイプラインの流れの
タイムチャートFIG. 4 is a time chart of a flow of a pipeline by the processor of FIG. 3;

【図５】スロットＢの命令を実行する図３の動作部分を
取出したブロック図FIG. 5 is a block diagram showing an operation part of FIG. 3 for executing an instruction in a slot B;

【図６】図５のスロットＢで実行される整数演算命令の
Ｄ，Ｅ，Ｎ，Ｗステージの動作説明図FIG. 6 is a diagram illustrating the operation of the D, E, N, and W stages of an integer operation instruction executed in slot B of FIG. 5;

【図７】本発明によるコンパイル処理手順のフローチャ
ートFIG. 7 is a flowchart of a compile processing procedure according to the present invention;

【図８】図７の命令スケジューリング処理における基本
ブロックの集合生成のフローチャートFIG. 8 is a flowchart for generating a set of basic blocks in the instruction scheduling process of FIG. 7;

【図９】本発明のコンパイルで生成する制御フローグラ
フの説明図FIG. 9 is an explanatory diagram of a control flow graph generated by compiling according to the present invention.

【図１０】図９の合流点に対する先行基本ブロックでの
タイミングずれの説明図FIG. 10 is an explanatory diagram of a timing shift in a preceding basic block with respect to the junction in FIG. 9;

【図１１】図９の合流点入口でのタイミング合せの説明
図FIG. 11 is an explanatory diagram of timing adjustment at the junction entrance in FIG. 9;

【図１２】バイパス制御情報を持たない命令スケジュー
リング前の命令列のアセンブリコードの説明図FIG. 12 is an explanatory diagram of an assembly code of an instruction sequence before instruction scheduling without bypass control information.

【図１３】ソフトウェア・バイパス制御機構を備えたプ
ロセッサのパイプラインの流れにおけるバイパスの選択
を示したタイムチャートFIG. 13 is a time chart showing bypass selection in a pipeline flow of a processor having a software bypass control mechanism.

【図１４】図１３の命令実行時のパイプライン流れを示
したタイムチャートFIG. 14 is a time chart showing a pipeline flow at the time of executing the instruction in FIG. 13;

【図１５】バイパス制御情報を持った命令スケジューリ
ング後のアセンブリコードの説明図FIG. 15 is an explanatory diagram of an assembly code after instruction scheduling having bypass control information.

【図１６】図１３の命令列の作成に使用する資源予約表
の初期状態の説明図16 is an explanatory diagram of an initial state of a resource reservation table used for creating the instruction sequence in FIG.

【図１７】図１３のＶＬＩＷ命令１に対する資源予約表
の作成説明図FIG. 17 is an explanatory diagram of creating a resource reservation table for the VLIW instruction 1 in FIG. 13;

【図１８】図１３のＶＬＩＷ命令２に対する資源予約表
の作成説明図FIG. 18 is an explanatory diagram of creating a resource reservation table for the VLIW instruction 2 in FIG.

【図１９】図１３のＶＬＩＷ命令３に対する資源予約表
の作成説明図FIG. 19 is an explanatory diagram for creating a resource reservation table for the VLIW instruction 3 in FIG. 13;

【図２０】図８の処理生成した集合に含まれる基本ブロ
ックのスケジューリングの詳細を示したフローチャート20 is a flowchart showing details of the scheduling of the basic blocks included in the set generated by the processing in FIG. 8;

【図２１】本発明のコンパイル実行プログラムを記録し
たコンピュータ読み取り可能な記録媒体の実施形態の説
明図FIG. 21 is an explanatory diagram of an embodiment of a computer-readable recording medium recording a compile execution program of the present invention.

【図２２】４並列パイプライン処理で実行されるＶＬＩ
Ｗ命令の構造説明図FIG. 22 is a VLI executed in four parallel pipeline processing.
Structure explanation diagram of W instruction

【図２３】従来のＶＬＩＷプロセッサによるパイプライ
ンの流れのタイムチャートFIG. 23 is a time chart of a pipeline flow by a conventional VLIW processor.

【図２４】従来のＶＬＩＷプロセッサ向けのコンパイラ
で使用する資源予約表の説明図FIG. 24 is an explanatory diagram of a resource reservation table used in a compiler for a conventional VLIW processor.

【図２５】図２４の資源予約表の作成に使用したＶＬＩ
Ｗ命令１〜３のアセンブリコードとその命令内容の説明
図FIG. 25 is a VLI used to create the resource reservation table of FIG. 24;
Explanatory drawing of the assembly code of W instructions 1 to 3 and the contents of the instruction

[Explanation of symbols]

１０：ソースプログラム１２：ソースファイル入力部１４：中間コード変換部１６：命令スケジューリング部１８：拡張資源予約表１８Ａ：第１リスト１８Ｂ：第２リスト２０：アセンブリコード変換部２２：アセンブリコード出力部２４：目的プログラム２６：命令演算部２８，３２：バスユニットインタフェース３０：コードＲＡＭ３４−１，３４−２：データＲＡＭ３８，４２：浮動小数点レジスタ４０：整数レジスタ４４，４６，５２，５４：浮動小数点演算パイプライン４８：整数演算パイプライン４８ａ：出力パス４８ｂ：Ｅバイパス４８ｃ：Ｎバイパス４８ｄ：Ｗバイパス４８ｅ：アドレスパス５０：加算器６０Ａ，６０Ｂ：ソースオペランド６２：即値パス６４：ロードバイパス 10: Source program 12: Source file input unit 14: Intermediate code conversion unit 16: Instruction scheduling unit 18: Extended resource reservation table 18A: First list 18B: Second list 20: Assembly code conversion unit 22: Assembly code output unit 24 : Object program 26: Instruction operation unit 28, 32: Bus unit interface 30: Code RAM 34-1, 34-2: Data RAM 38, 42: Floating point register 40: Integer register 44, 46, 52, 54: Floating point Operation pipeline 48: Integer operation pipeline 48a: Output path 48b: E bypass 48c: N bypass 48d: W bypass 48e: Address path 50: Adder 60A, 60B: Source operand 62: Immediate path 64: Load bypass

Claims

[Claims]

A plurality of instruction elements obtained by dividing an instruction are executed in parallel by pipeline processing. In the pipeline processing, an operation result of a preceding instruction is output to a bypass before being written to a register, and a source operand of a subsequent instruction is output. Compile to compile an instruction sequence to be executed by a processor with a bypass control mechanism that specifies whether to read a source operand of each instruction from a register or one of the plurality of bypass outputs in an assembly code. A saw program input unit for inputting a source program of an instruction sequence described by the VLIW instruction; an intermediate code conversion unit for converting the instruction sequence input at the source program input unit into an intermediate code instruction sequence; Creating a control flow graph of the instruction sequence converted to the intermediate code An instruction scheduling unit that registers an operation number and bypass information for specifying a bypass for reading a source operand in a write part of the resource reservation table for each basic block of the control flow graph to generate an extended resource reservation table; An assembly code conversion unit that converts an instruction including bypass information into an assembly code based on the operation number and the bypass information of the extended resource reservation table created by the unit; and a target program of the instruction sequence converted by the assembly code conversion unit. And a compiling device for outputting an assembly code.

2. The compiling device according to claim 1, wherein
The extended resource reservation table generated by the instruction scheduling unit includes, for the source operand value% r, a first list for registering a set of bypass information in which an operation number and an operation result are output for each execution cycle, and each instruction of a VLIW instruction. For the slot corresponding to the element, the second information in which the bypass information obtained from the first list is registered together with the operation number for each execution cycle.
And a list.

3. The compiling apparatus according to claim 1, wherein:
The instruction scheduling unit selects a trace from a set of basic blocks for which scheduling has not been completed in the control flow graph, sets the trace as a set to be scheduled, further has an edge joining the trace, and has not completed scheduling. A compiling apparatus, wherein, when a basic block exists, the basic block is added to the trace and included in the set to be scheduled, and a basic block is scheduled for each set to be scheduled.

4. The compiling device according to claim 3, wherein
The instruction scheduling unit, for a plurality of basic blocks included in the scheduling target set, in order to avoid a case where the resource reservation table of the preceding basic block cannot be referred to, a basic block having no preceding basic block, A basic block whose scheduling has been completed but its own scheduling has not been completed; a basic block which has a basic block whose scheduling has been completed as a preceding basic block and has the least number of preceding basic blocks whose scheduling has not been completed; A compiling device for sequentially selecting and repeating scheduling of basic blocks.

5. The compiling apparatus according to claim 1, wherein:
The instruction scheduling unit is configured to complete the instruction scheduling of a plurality of basic blocks preceding the basic block at the junction for the optimization of the control flow graph junction with respect to the basic blocks included in the scheduling target set. In this case, scheduling is performed with reference to the resource reservation table of the basic block at the junction, and in the scheduling of the basic block at the junction, an operation using the value of a register defined by a plurality of preceding basic blocks is performed. If it exists in the basic block at the junction, a move instruction between registers is generated in an empty slot of the preceding basic block to adjust the bypass timing at the entrance of the basic block at the junction, and adjust the bypass timing. If not, the value of the operand Compiling and wherein the scheduling to read from the register.

6. A compiling device according to claim 1, wherein:
The instruction scheduling unit completes instruction scheduling of a plurality of basic blocks preceding the basic block at the junction for optimization of a control flow graph junction with respect to a plurality of basic blocks included in the scheduling target set. If not,
Operations that use operands defined by basic blocks before the preceding basic block for which instruction scheduling has not been completed are scheduled to be executed after the cycle in which writing to the register is completed. Adjusts the bypass timing at the entrance of the basic block at the junction by generating a move instruction between registers in an empty slot of the preceding basic block, and if the bypass timing cannot be adjusted, the value of the operand And compiling a parallel VLIW instruction.

7. A plurality of instruction elements obtained by dividing an instruction are executed in parallel by pipeline processing, and in the pipeline processing, an operation result of a preceding instruction is output to a bypass before being written to a register, and a source operand of a subsequent instruction is written. Compile to compile an instruction sequence to be executed by a processor with a bypass control mechanism that specifies whether to read a source operand of each instruction from a register or one of the plurality of bypass outputs in an assembly code. In the method, a source program inputting step of inputting a source program as an instruction sequence described in the instruction, an intermediate code converting step of converting the instruction sequence input in the source file inputting step into an intermediate code instruction sequence, Create a control flow graph of the instruction sequence converted to the intermediate code. An instruction scheduling step of registering bypass information for specifying a bypass for reading an operation number and a source operand in a write portion of the resource write table for each basic block of the control flow graph to generate an extended resource reservation table; An assembly code conversion step of converting an instruction including bypass designation information into an assembly code based on the operation number and the bypass information of the extended resource reservation table created by the instruction scheduling unit; and an instruction sequence converted by the assembly code conversion unit An assembly code output process of outputting the assembly code of
A compilation method characterized by comprising:

8. The compiling method according to claim 7, wherein:
The extended resource reservation table generated in the instruction scheduling process includes, for the source operand value% r, a first list for registering a set of bypass information for outputting an operation number and an operation result for each execution cycle, and each instruction of a VLIW instruction. For the slot corresponding to the element, the second information in which the bypass information obtained from the first list is registered together with the operation number for each execution cycle.
And a list.

9. The compiling method according to claim 7, wherein:
In the instruction scheduling step, a trace is selected from a set of basic blocks for which scheduling has not been completed in the control flow graph to be a set to be scheduled, and further has an edge joining the trace, and scheduling has not been completed. A compile method comprising, when a basic block exists, adding the basic block to the trace and including the basic block in the set to be scheduled, and scheduling the basic block for each set to be scheduled.

10. The compiling method according to claim 9, wherein the instruction scheduling step is for avoiding a case where a resource reservation table of a preceding basic block cannot be referred to for a plurality of basic blocks included in the set to be scheduled. A basic block having no preceding basic block, a basic block for which scheduling of all preceding basic blocks has been completed but its own scheduling has not been completed, a basic block for which scheduling has been completed for the preceding basic block, and scheduling is not performed. A compilation method characterized by sequentially selecting basic blocks having the least number of unfinished preceding basic blocks and repeating basic block scheduling.

11. The compile method according to claim 7, wherein said instruction scheduling step comprises the steps of optimizing a converging point of a control flow graph for a plurality of basic blocks included in said set to be scheduled. If the instruction scheduling of a plurality of basic blocks preceding the block has all been completed, scheduling is performed by referring to the resource reservation table of the basic block at the junction. If an operation using the register value defined by a plurality of basic blocks is present in the basic block at the junction, a move instruction between registers is generated in an empty slot of the preceding basic block. Adjust the bypass timing at the block entrance to If it is not possible to adjust the grayed may compile wherein the scheduling to read the value of the operand from the register.

12. The compile method according to claim 7, wherein said instruction scheduling step includes the steps of: optimizing a control flow graph junction for a plurality of basic blocks included in said set to be scheduled; If the instruction scheduling of a plurality of basic blocks preceding the block has not been completed,
Operations that use operands defined by basic blocks before the preceding basic block for which instruction scheduling has not been completed are scheduled to be executed after the cycle in which writing to the register is completed. Adjusts the bypass timing at the entrance of the basic block at the junction by generating a move instruction between registers in an empty slot of the preceding basic block, and if the bypass timing cannot be adjusted, the value of the operand Compiling a schedule to read from a register.

13. A plurality of instruction elements obtained by dividing an instruction are executed in parallel by pipeline processing, and in the pipeline processing, an operation result of a preceding instruction is output to a bypass before being written to a register, and a source operand of a subsequent instruction is written. Compile to compile an instruction sequence to be executed by a processor with a bypass control mechanism that specifies whether to read a source operand of each instruction from a register or one of the plurality of bypass outputs in an assembly code. In a computer-readable recording medium on which an execution program is recorded, a saw program input unit for inputting a source program of an instruction sequence described in the VLIW instruction; An intermediate code conversion unit for converting into an instruction sequence; Create a control flow graph of the instruction sequence converted into the intermediate code, and register the operation number and bypass information for specifying the bypass for reading the source operand in the write part of the resource reservation table for each basic block of the control flow graph. An instruction scheduling unit that generates an extended resource reservation table by using the operation number and the bypass information of the extended resource reservation table created by the instruction scheduling unit. A computer-readable storage medium storing a compile execution program, comprising: an assembly code output unit that outputs a target program of an instruction sequence converted by the assembly code conversion unit.