JPH02130635A

JPH02130635A - Simultaneous processing system for plural instructions

Info

Publication number: JPH02130635A
Application number: JP28367988A
Authority: JP
Inventors: Michio Morioka; 道雄森岡; Kenichi Kurosawa; 黒沢　憲一; Tadaaki Bando; 忠秋坂東
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1988-11-11
Filing date: 1988-11-11
Publication date: 1990-05-18
Anticipated expiration: 2010-07-31
Also published as: JPH0769824B2

Abstract

PURPOSE:To maintain successiveness and to simultaneously process plural instructions by synchronously processing the plural combined instructions by a pipe line between plural operand reading means and plural arithmetic units. CONSTITUTION:An instruction fetch stage 500 simultaneously reads the plural instructions from a cache memory 520 for instruction. In a predecode stage 501, the plural instructions are simultaneously segmented. Then, when there is a branching instruction, a branch forecasting buffer 521 is accessed and a branching direction is determined. Further, in a decode and combine stage 503, the two instructions are simultaneously decoded and based on a decoded result, it is decided whether the instructions can be combined or not. The combined instructions are synchroniously executed by two pipe lines. The condition of an arithmetic result in each pipe line is merged according to the order of the instruction and reflected on only one state register. Thus, the successiveness is maintained and the plural instructions can be simultaneously processed.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、逐次的に命令を実行する演算処理装置に係り
、特に複数の命令を同時に実行する演算処理装置に関す
る。更に詳しくは、本発明は、複数のパイプラインから
なる演算処理装置において、命令の逐次性を維持しつつ
、複数の命令を同時に実行するための、アーキテクチャ
に関する。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to an arithmetic processing device that executes instructions sequentially, and particularly to an arithmetic processing device that executes a plurality of instructions simultaneously. More specifically, the present invention relates to an architecture for simultaneously executing a plurality of instructions while maintaining instruction seriality in an arithmetic processing unit consisting of a plurality of pipelines.

[Conventional technology]

従来、汎用計算機の高性能化は、パイプラインを多段化
することによって実現されてきた。これは、１つの命令
の実行に必要な処理、例えば、命令フェッチ、デコード
、オペランドアドレス計算。Conventionally, the performance of general-purpose computers has been improved by increasing the number of pipeline stages. This includes the processing necessary to execute one instruction, such as instruction fetch, decoding, and operand address calculation.

オペランドフェッチ、演算等の処理を、それぞれ独立な
ステージとし、ステージ間で異なる命令を実行すること
により高性能化を図る手段である。This is a means of improving performance by setting processes such as operand fetch and arithmetic operations to independent stages, and executing different instructions between stages.

多段パイプライン方式では、命令が逐次的に実行される
ことを前提としている。すなわち、分岐命令等のプログ
ラム制御命令が実行されない限り、プログラムカウンタ
に基づいて、逐次的に命令が実行され、命令の実行順序
が入れ変わることは、あり得ない、従来の汎用計算機で
実行される命令は、この命令実行の逐次性を前提とした
仕様となっている。The multi-stage pipeline method assumes that instructions are executed sequentially. In other words, unless a program control instruction such as a branch instruction is executed, instructions are executed sequentially based on the program counter, and it is impossible for the instruction execution order to change. The specifications of the instructions assume the sequential nature of instruction execution.

一方、単一のプロセッサにおいて、複数命令を並列に実
行することによって高速化を図る試みが古くからなされ
てきた。例えば、”ＰａｒａｌｌｅｌＯｐｅｒａｔｉｏ
ｎ　ｉｎ　ｔｈｅ　Ｃｏｎｔｒｏｌ　Ｄａｔａ　６６０
０．”　Ｐｒｏｃ、ｏｆＳｐｒｉｎｇ　Ｊｏｉｎｔ　Ｃ
ｏｍｐｕｔｅｒ　Ｃｏｎｆｅｒｓｎｃｅ、　１９６４に
記載されているＣＤＣ６６００、最近の計算機では、”
Ｓｕｐｅｒｃｏｓｐｕｔｉｎｇ　　ｏｎ　　Ｃｈｉｐ、
”　　ＶＬＳＩ　　ＳｙｓｔｅｍｓＤｅｓｉｇｎ　ｍａ
ｙ　１９８８．ｐｐ　２４−３３に記載されているモト
ローラ社のＭＣ８１１１１００等がある。又、特開昭６
２−２６２１４２号に記載の“複数実行ユニット・ユニ
プロセッサシステム”も同様のアーキテクチャと考えら
れる。On the other hand, attempts have been made for a long time to increase the speed of a single processor by executing multiple instructions in parallel. For example, “ParallelOperation
n in the Control Data 660
0. ”Proc, ofSpring Joint C
CDC6600, described in Computer Conference, 1964, is a recent computer.
Supercosputing on Chip,
” VLSI Systems Design ma
y 1988. Examples include Motorola's MC8111100, which is described in pp. 24-33. Also, JP-A-6
The "Multiple Execution Unit Uniprocessor System" described in No. 2-262142 is also considered to have a similar architecture.

ＣＤＣ６６００，ＭＣ８８１００は、固定小数点、浮動
小数点の演算を、汎用レジスタ間のデータにのみ限定し
ている。汎用レジスタと、主メモリ間のデータ転送は、
専用のロード／ストア命令によって実行される。又、演
算ユニットは複数個設けられ、独立に動作可能となって
いる。この様なデータストラフチャを採用することによ
り、主メモリと、汎用レジスタ間の転送命令と演算命令
、あるいは、複数の演算命令を並列に実行することが可
能となる。The CDC6600 and MC88100 limit fixed-point and floating-point operations only to data between general-purpose registers. Data transfer between general-purpose registers and main memory is
Executed by dedicated load/store instructions. Further, a plurality of arithmetic units are provided and can operate independently. By employing such a data structure, it becomes possible to execute transfer instructions and arithmetic instructions between the main memory and general-purpose registers, or to execute a plurality of arithmetic instructions in parallel.

本アーキテクチャにおいては、転送命令、演算命令は非
同期に実行される。この命令実行の非同期性は、プログ
ラム中に内在する並列性を引き出すという点では有効で
あるが、いくつかの問題点を含む。In this architecture, transfer instructions and operation instructions are executed asynchronously. This asynchronous nature of instruction execution is effective in bringing out the inherent parallelism in a program, but it involves several problems.

第１の問題点としては、命令の逐次性を維持するだめに
複雑な制御機構が必要となることである。The first problem is that a complicated control mechanism is required to maintain the sequentiality of instructions.

すなわち、あるデータに対して演算を施す場合、対象と
なるデータを主メモリから汎用レジスタに転送し、汎用
レジスタ上のデータに対して演算を行ない、その結果を
汎用レジスタから主メモリへ転送する。これらの処理は
、ロード命令、演算命令、ストア命令の３つの命令によ
って実現され。That is, when performing an operation on certain data, the target data is transferred from the main memory to a general-purpose register, the operation is performed on the data on the general-purpose register, and the result is transferred from the general-purpose register to the main memory. These processes are realized by three instructions: a load instruction, an operation instruction, and a store instruction.

且つ、これらの３命令は、逐次的に実行されなければな
らない。しかし、もし、各命令が非同時に実行されるな
らば、命令の逐次性を保証できない。Moreover, these three instructions must be executed sequentially. However, if each instruction is executed non-simultaneously, the sequentiality of the instructions cannot be guaranteed.

このため、ＣＤＣ６６００アーキテクチヤでは、スコア
ボード方式を採用することにより命令の逐次性を維持し
ている。これは、汎用レジスタに各エントリーにスコア
ボードビットと呼ばれる排他制御用のフラグを設ける方
式である。命令をデコードした時点で、該命令のオペラ
ンドを含む汎用レジスタのフラグをＯＮにする。該命令
の実行が完了した時点で、ＯＮＬ、ていたフラグをクリ
アする。スコアボードビットがＯＮとなっているレジス
タにアクセスしようとした命令は、ブロックされフラグ
ＯＦＦとなるまでアクセスできない。これによって、前
述した命令の逐次性を維持する。特開昭６２−２６２１
４２号に記載の“複数実行ユニット・ユニプロセッサシ
ステム″においても、複数の演算ユニットが設けられ、
非同期に複数の命令が実行され得る。本アーキテクチャ
においても、命令の逐次性を保証するために、汎用レジ
スタに、より拡張された排他制御機構を設けている。こ
の様な汎用レジスタファイルの排他制御機構は、ハード
ウェアの複雑さが増すことに加え、演算処理の性能低下
の原因にもなり得る。すなわち、ロード、演算、ストア
といった一連の逐次的な処理を実行する場合、従来の単
一パイブライン方式では、命令の逐次性が保証されてい
るため、汎用レジスタの排他制御を簡略化することが可
能であり、ロードしたデータを汎用レジスタを介さず直
接演算命令に渡すことが可能であった。一方、ＣＤＣ６
６００アーキテクチヤでは、ロードしたデータが汎用レ
ジスタに格納され、スコアボードフラグがクリアされる
まで、次の演算命令を実行できない。すなわち、排他制
御のオーバヘッドにより、命令間のデータ受渡しが遅れ
ることになる。For this reason, the CDC6600 architecture maintains the sequentiality of instructions by adopting a scoreboard method. This is a method in which a flag for exclusive control called a scoreboard bit is provided for each entry in a general-purpose register. When an instruction is decoded, the flag of the general-purpose register containing the operand of the instruction is turned ON. When execution of the instruction is completed, ONL clears the flag. An instruction attempting to access a register whose scoreboard bit is ON is blocked and cannot be accessed until the flag is turned OFF. This maintains the sequentiality of the instructions described above. Japanese Patent Publication No. 62-2621
Also in the "multiple execution unit uniprocessor system" described in No. 42, multiple arithmetic units are provided,
Multiple instructions may be executed asynchronously. In this architecture as well, a more extended exclusive control mechanism is provided in the general-purpose register in order to guarantee the sequentiality of instructions. Such an exclusive control mechanism for a general-purpose register file not only increases the complexity of the hardware, but also may cause a decrease in the performance of arithmetic processing. In other words, when executing a series of sequential processes such as loads, operations, and stores, the conventional single pipeline method guarantees instruction sequentiality, making it possible to simplify exclusive control of general-purpose registers. , it was possible to pass loaded data directly to arithmetic instructions without going through general-purpose registers. On the other hand, CDC6
In the 600 architecture, the next arithmetic instruction cannot be executed until the loaded data is stored in a general purpose register and the scoreboard flag is cleared. In other words, the overhead of exclusive control delays data transfer between instructions.

又、命令実行の非同期性に伴なう第２の問題点としては
、演算処理装置の状態の管理の複雑さを挙げることがで
きる。従来の単一パイブライン方式では、命令の実行順
序が変化する。ことがなく、命令の順序に従って、演算
処理装置の状態が変化しこれが状態レジスタに反映され
る。これが守られるならば、状態の管理は容易である６
例えば、演算処理装置の状態に従って分岐する条件分岐
命令を実行する場合、条件の判定を行なう時には、それ
以前の命令の実行結果が、状態レジスタに反映されてい
ることが保証される。又、例えば、割込要求があった場
合、割込要求が発生した時点の演算処理装置の状態は容
易に判別可能である。そして、割込要求に対する何らか
の処理を実施した。・後、割込んだ時点の演算処理装置
の状態を再現することも容易である。A second problem associated with the asynchronous nature of instruction execution is the complexity of managing the state of the arithmetic processing unit. In a conventional single pipeline system, the order of execution of instructions changes. The state of the arithmetic processing unit changes according to the order of instructions, and this is reflected in the state register. If this is followed, the situation will be easy to manage6
For example, when executing a conditional branch instruction that branches according to the state of the arithmetic processing unit, it is guaranteed that the execution result of the previous instruction is reflected in the state register when the condition is determined. Further, for example, when an interrupt request is made, the state of the arithmetic processing device at the time when the interrupt request occurs can be easily determined. Then, some processing was performed in response to the interrupt request. - Afterwards, it is easy to reproduce the state of the arithmetic processing unit at the time of the interrupt.

これに対し、非同期に命令が実行される場合には、命令
が逐次的に実行される保証がなく、演算処理装置の状態
の管理が複雑になる。例えば、前述の条件分岐命令と、
条件を生成する命令が非同時に実行されるならば、条件
分岐命令を実行する時に、条件を生成する命令の実行結
果が状態レジスタに反映されているとは限らない。この
間層に対して、モトローラ社のＭＣ８８１００では、条
件生成命令の実行結果の状態を汎用レジスタを介して条
件分岐命令に渡す方法を採っている。この方法によれば
、汎用レジスタの排他制御機構により、条件生成命令と
条件分岐命令の同期をとることが可能となる。しかし、
本方式では、条件分岐命令が、そのオペランドとして汎
用レジスタを指定できる命令仕様となっていることが不
可欠である。該命令仕様を持たない命令セットを実行す
る計算機では、本方式は実現できない。On the other hand, when instructions are executed asynchronously, there is no guarantee that the instructions will be executed sequentially, making management of the state of the arithmetic processing device complicated. For example, the conditional branch instruction mentioned above,
If the instructions that generate the condition are executed non-simultaneously, the execution result of the instruction that generates the condition is not necessarily reflected in the status register when a conditional branch instruction is executed. For this intermediate layer, Motorola's MC88100 employs a method in which the state of the execution result of the condition generation instruction is passed to the conditional branch instruction via a general-purpose register. According to this method, the exclusive control mechanism of the general-purpose register makes it possible to synchronize the condition generation instruction and the conditional branch instruction. but,
In this method, it is essential that the conditional branch instruction has an instruction specification that allows a general-purpose register to be specified as its operand. This method cannot be implemented on a computer that executes an instruction set that does not have this instruction specification.

[Problem to be solved by the invention]

上記の様に、複数の異なる実行ユニットを設は非同期に
命令を実行する様な従来の複数命令同時処理方式では、
本質的に逐次的な処理の実行順序を保証するために、汎
用レジスタにおいて複雑な排他制御機構が不可欠となる
問題があった。又、該排他制御機構に起因して、命令間
のデータの受渡しのオーバヘッドが大きくなり性能が低
下する問題があった。更には、命令が非同期に実行され
るため、命令が順序に従って実行されるとは限らず演算
処理装置の状態レジスタの管理が複雑になる問題があっ
た。As mentioned above, in the conventional multiple instruction simultaneous processing method where multiple different execution units are set up and instructions are executed asynchronously,
In order to guarantee the execution order of essentially sequential processing, there has been a problem in that a complex exclusive control mechanism is indispensable in general-purpose registers. Furthermore, due to the exclusive control mechanism, there is a problem in that the overhead of data transfer between instructions becomes large, resulting in a decrease in performance. Furthermore, since the instructions are executed asynchronously, the instructions are not necessarily executed in order, making management of the status register of the arithmetic processing unit complicated.

複数命令を１命令実行時間に同時に実行させる例として
特開昭６２−６５１３３号公報が知られているが、ここ
には、複数命令を具体的にどのように実行するかが開示
されていない。Japanese Patent Application Laid-Open No. 62-65133 is known as an example of simultaneously executing multiple instructions in one instruction execution time, but this publication does not disclose how to specifically execute multiple instructions.

本発明の目的は、プログラムに記述された命令の順序に
従い、逐次性を維持しつつ複数命令を同時に処理可能な
演算処理装置および複数命令同時処理方式を提供するこ
とにある。An object of the present invention is to provide an arithmetic processing device and a method for simultaneously processing multiple instructions, which can process multiple instructions simultaneously while maintaining sequentiality according to the order of instructions written in a program.

本発明の他の目的は、複数の演算処理装置に対して少な
くとも１つの状態レジスタを持ち、該状態レジスタが、
プログラム中に記述された命令の順序に従い更新される
ことを保証する複数命令同時処理方式を提供することに
ある。Another object of the present invention is to have at least one status register for a plurality of arithmetic processing units, the status register comprising:
The object of the present invention is to provide a method for simultaneously processing multiple instructions that guarantees that the instructions are updated in accordance with the order of instructions written in the program.

本発明の他の目的は、汎用レジスタファイルにおいて、
排他制御機構を不要とする複数命令同時処理方式を提供
することにある。Another object of the invention is that in the general purpose register file:
The object of the present invention is to provide a method for simultaneously processing multiple instructions that does not require an exclusive control mechanism.

本発明の他の目的は、同一輪理のハードウェアを繰返し
用いて構成することが可能な複数命令同時処理方式を提
供することにある。Another object of the present invention is to provide a multiple instruction simultaneous processing system that can be configured by repeatedly using the same hardware.

[Means to solve the problem]

上記目的は、同一輪理のハードウェアからなる複数のパ
イプラインによってプロセッサを構成し、複数命令を同
時にデコードする手段と、デコードした複数の命令が並
列に実行可能かどうか識別し可能ならば、これら複数の
命令を結合する手段と結合された複数の命令を、複数の
パイプライン処理装置で常に同期させて実行する手段か
ら成る複数命令同時処理力時によって達成される。複数
のパイプラインから構成されるプロセッサには、演算処
理結果の状態を示す唯一の状態レジスタが設けられる。The above purpose is to configure a processor with multiple pipelines made of hardware of the same processor, to simultaneously decode multiple instructions, and to identify whether or not multiple decoded instructions can be executed in parallel, if possible. This is achieved by a multi-instruction simultaneous processing system comprising means for combining a plurality of instructions and means for always synchronously executing the combined plurality of instructions in a plurality of pipeline processing units. A processor composed of multiple pipelines is provided with only one status register that indicates the status of arithmetic processing results.

結合された複数命令の演算結果の状態は、プログラム中
の命令の順序に従って結合され同時に状態レジスタに反
映させる。又、レジスタファイル及び、キャッシュメモ
リは、複数のパイプラインによって共有され、各パイプ
に対して、同時にオペランドを供給できる様に複数のリ
ードライトポートを持つ。The states of the operation results of the combined multiple instructions are combined according to the order of the instructions in the program and reflected in the status register at the same time. Further, the register file and cache memory are shared by multiple pipelines, and each pipe has multiple read/write ports so that operands can be supplied simultaneously.

（作用〕デコーダは、対象となる命令が固定長命令、あるいは可
変長命令に係わらず、同時に複数の命令をデコードし、
命令の切出し及び、分析を行なう。(Operation) The decoder decodes multiple instructions at the same time, regardless of whether the target instructions are fixed-length instructions or variable-length instructions.
Extract and analyze instructions.

切出された複数の命令は、オペランドの競合検証あるい
は、命令種類の比較が行なわれ、並列実行可能かどうか
の判定が行なわれる。並列実行可能であれば、これらの
命令は結合され、複数のパイプラインによって実行され
る。この時、結合された複数の命令は同期して実行され
る。すなわち、命令の複雑さに係わらず結合された複数
の命令は各パイプラインの同一のステージに存在する。A plurality of extracted instructions are subjected to operand conflict verification or instruction type comparison to determine whether they can be executed in parallel. If parallel execution is possible, these instructions are combined and executed by multiple pipelines. At this time, the combined instructions are executed synchronously. That is, multiple combined instructions exist in the same stage of each pipeline, regardless of the complexity of the instructions.

又、各パイプラインにおける演算結果の状態は、命令の
順序に従ってマージされ、唯一の状態レジスタに反映さ
れる。これによって、命令の実行順序を変えることなく
、逐次性を維持して複数の命令を同時に処理することが
可能となる。命令の逐次性が保証されることによって汎
用レジスタにおける排他制御を簡略化することが可能と
なる。又、唯一の状態レジスタは、命令の順序に従って
更新されることが保証され、条件分岐命令、あるいは割
込処理における演算処理装置の状態の管理が容易となる
。Further, the states of operation results in each pipeline are merged according to the order of instructions and reflected in a single state register. This makes it possible to simultaneously process multiple instructions while maintaining sequentiality without changing the instruction execution order. By guaranteeing the sequentiality of instructions, exclusive control in general-purpose registers can be simplified. Furthermore, the only status register is guaranteed to be updated in accordance with the order of instructions, making it easier to manage the status of the arithmetic processing unit during conditional branch instructions or interrupt processing.

〔Example〕

以下、本発明の一実施例を図面を用いて説明する。第２
図は、本発明が適用される計算機システムの１例を示し
ている。クラスタコンピュータ１００．１１０，１２０
は、それぞれグローバルメモリポート１３１，１３２，
１３３によってグローバルメモリ１３０に接続される。An embodiment of the present invention will be described below with reference to the drawings. Second
The figure shows an example of a computer system to which the present invention is applied. Cluster computer 100, 110, 120
are global memory ports 131, 132, and 132, respectively.
133 to the global memory 130.

各クラスタコンピュータは、グローバルメモリ１３０を
共有しており、グローバルメモリは高信頼化のため２重
化される。又、各クラスタコンピュータは、工１０スイ
ッチングネットワーク１４０を介して磁気ディスク１４
１，１４２、あるいは、端末装置１４３．１４４に接続
される。クラスタコンビュ−夕１００の内部では、演算
処理装置１０３゜１０４，１０５，１０６が、共有バス
１０２、メモリポート１０８を介して、共有メモリ１０
１に接続される。共有メモリ１０１には、各演算処理装
置で必要となるプログラムや、データが格納される。演
算処理装置から磁気ディスク１４１゜１４２等の入出力
装置をアクセスする場合は、入出力ポート１０７を介し
て行なわれる。Each cluster computer shares a global memory 130, and the global memory is duplicated for high reliability. Each cluster computer also connects the magnetic disk 14 via the switching network 140.
1,142 or terminal devices 143, 144. Inside the cluster computer 100, arithmetic processing units 103, 104, 105, and 106 connect to a shared memory 10 via a shared bus 102 and a memory port 108.
Connected to 1. The shared memory 101 stores programs and data required by each arithmetic processing unit. When the arithmetic processing unit accesses the input/output devices such as the magnetic disks 141 and 142, it is done through the input/output port 107.

次に、第３図を用いて、演算処理装置１０３の内部構成
について詳細に説明する。命令用キャッシュメモリ２３
０は、演算処理装置１０３にて実行する命令を一時的に
保持する。命令フェッチユニット２００は、命令用キャ
ッシュメモリ２０３より命令を読出し、命令実行ユニッ
ト２１０に転送する。命令フェッチユニット２００の送
出する論理アドレス２０１は、命令用アドレス変換バッ
ファ２２０によって物理アドレスに変換され、命令用キ
ャッシュメモリ２３０に供給される。命令用キャッシュ
メモリ２３０より読出された命令はバス２０２を介して
命令フェッチユニット２００に供給される。命令フェッ
チユニット２００の他の機能としては、分岐予測バッフ
ァを内蔵しており、フェッチした命令の中に分岐命令を
検出すると、分岐予測バッファをアクセスすることによ
り、分岐先の命令アドレスを識別し、命令フェッチの方
向制御を行なう、オペランド用キャッシュメモリ２５０
は、命令実行ユニット２１０によってアクセスされるオ
ペランドを一時的に保持する。オペランド用アドレス変
換バッファ２４０は、命令実行ユニット２１０の送出す
る論理アドレス２０３を物理アドレスに変換しオペラン
ド用キャッシュメモリ２５０に送る。命令実行ユニット
は、命令フェッチユニット２００より受取った命令をデ
コードし、その結果に従って、オペランドのアドレス計
算、オペランドフェッチ、及び演算を行なう。Next, the internal configuration of the arithmetic processing unit 103 will be explained in detail using FIG. Instruction cache memory 23
0 temporarily holds instructions to be executed by the arithmetic processing unit 103. The instruction fetch unit 200 reads instructions from the instruction cache memory 203 and transfers them to the instruction execution unit 210. The logical address 201 sent by the instruction fetch unit 200 is converted into a physical address by the instruction address translation buffer 220 and supplied to the instruction cache memory 230. The instructions read from the instruction cache memory 230 are supplied to the instruction fetch unit 200 via the bus 202. Other functions of the instruction fetch unit 200 include a built-in branch prediction buffer, and when a branch instruction is detected among the fetched instructions, the instruction address of the branch destination is identified by accessing the branch prediction buffer. Operand cache memory 250 that controls instruction fetch direction
temporarily holds operands accessed by instruction execution unit 210. The operand address conversion buffer 240 converts the logical address 203 sent by the instruction execution unit 210 into a physical address and sends it to the operand cache memory 250. The instruction execution unit decodes the instruction received from the instruction fetch unit 200, and performs operand address calculation, operand fetch, and operation according to the decode result.

共有バスモニタ２６０は、共有バス１０２上のトランザ
クションを監視し、必要ならば、オペランド用キャッシ
ュメモリ２５０の無効化、更新等を行なう。これによっ
て、複数の演算処理装置に設けられたオペランドキャッ
シュメモリの一致保証が行なわれる。The shared bus monitor 260 monitors transactions on the shared bus 102 and invalidates, updates, etc. the operand cache memory 250 if necessary. This ensures consistency of operand cache memories provided in a plurality of arithmetic processing units.

次に第４図を用いて命令フェッチユニット２００の詳細
について説明する。フェッチポインタ３００はフェッチ
すべき命令のアドレスを保持する。命令が逐次的にフェ
ッチされる限り、セレクタ３０２は加算器３０１を選択
しており、フェッチポインタは特定の増分を加算される
。本実施例では、１回の命令フェッチで読出すデータ幅
を１６Ｂｙｔｅｓとしているため、フェッチポインタ３
００の増分は１６となっている。フェッチした命令の内
に分岐命令が存在した場合、セレクタ３０２は、分岐予
測バッファ３３０、あるいは、命令実行ユニットより送
られてくる分岐先アドレス３０４を選択し、分岐先アド
レスをフェッチポインタ３００にセットする。フェッチ
ポインタ３００のアドレスに従って、命令用キャッシュ
メモリ２３０より読／出された命令は、セレクタ３０３
を介して、命令バッファ３１０に格納される。命令バッ
ファ３１０は、ファーストイン・ファーストアウト方式
のバッファであり、そのサイズは、ここでは、１６Ｂｙ
ｔｅｓ　Ｘ　８　　エントリで考える。３１２は、命令
バッファ３１０の読出しアドレスレジスタである。Next, details of the instruction fetch unit 200 will be explained using FIG. 4. Fetch pointer 300 holds the address of the instruction to be fetched. As long as instructions are fetched sequentially, selector 302 has selected adder 301 and the fetch pointer is added by a certain increment. In this embodiment, since the data width read in one instruction fetch is 16 Bytes, the fetch pointer 3
The increment of 00 is 16. If a branch instruction exists among the fetched instructions, the selector 302 selects the branch destination address 304 sent from the branch prediction buffer 330 or the instruction execution unit, and sets the branch destination address in the fetch pointer 300. . The instruction read/output from the instruction cache memory 230 according to the address of the fetch pointer 300 is sent to the selector 303.
is stored in the instruction buffer 310 via the instruction buffer 310. The instruction buffer 310 is a first-in first-out buffer, and its size is 16 Bytes here.
Consider tes X 8 entries. 312 is a read address register for the instruction buffer 310.

読出しアドレスレジスタ３１２は、命令バッファ３１０
の任意のバイト位置を示しており、アライナ３１１は、
該バイト位置より１６Ｂｙｔｅｓの情報を読出して、デ
コーダ３１４に送る。命令切出し部３１５より、切出し
た命令のサイズが加算器３１３に転送され、新たな読出
しアドレスレジスタ３１２の値が決定される。デコーダ
部３１４は、命令バッファから読出された１６Ｂｙｔｅ
ａの情報を、命令の最小単位と等しいビット幅を持つ複
数のデコーダによってデコードを行なう。ここでは、命
令の最小単位を２　Ｂｙｔｅｓとしており、１６Ｂｙｔ
ｅｓの情報は、Ｚ　Ｂｙｔｅｓ毎８つのデコーダで同時
にデコードしている。これら８つのデコーダの分析結果
は、命令切出し部３１５に転送される。命令切出し部３
１５は、デコーダ部の情報に従い、茅１命令３１９の切
出し、及び、第１命令のサイズ３１６の識別、並びに第
２命令３２５の切出し、及び第２命令のサイズ３２５の
識別を行なう。この例では、同時に２命令の切出しとし
たが、当然のことなから２命令以上の切出しを同時に行
なう構成も可能である。以上の様なデコード方式を採用
すれば、可変長命令をデコードする場合でも、同時に複
数の命令を切出すことが可能となる。切出された第１命
令３１９．第２命令３２５、及び、それぞれのサイズ情
報３１６，３１７は、実行ユニット用命令バッファ３４
０に同時に格納される。The read address register 312 is connected to the instruction buffer 310.
The aligner 311 shows an arbitrary byte position of
16 Bytes of information is read from the byte position and sent to the decoder 314. The size of the extracted instruction is transferred from the instruction extraction unit 315 to the adder 313, and the value of the new read address register 312 is determined. The decoder unit 314 receives the 16 Bytes read from the instruction buffer.
The information in a is decoded by a plurality of decoders having a bit width equal to the minimum unit of an instruction. Here, the minimum unit of the instruction is 2 Bytes, and 16 Bytes
The es information is simultaneously decoded by eight decoders for each Z Bytes. The analysis results of these eight decoders are transferred to the instruction extraction unit 315. Command extraction section 3
15 extracts the first instruction 319, identifies the size 316 of the first instruction, extracts the second instruction 325, and identifies the size 325 of the second instruction, according to the information of the decoder unit. In this example, two instructions are extracted at the same time, but of course a configuration in which two or more instructions are extracted simultaneously is also possible. By adopting the above decoding method, even when decoding variable length instructions, it is possible to extract a plurality of instructions at the same time. Extracted first instruction 319. The second instruction 325 and the respective size information 316, 317 are stored in the execution unit instruction buffer 34.
0 at the same time.

一方、プログラムカウンタ３２０は、デコーダ部３１４
にて切出される第１命令の主メモリ上のアドレスを保持
している。第２命令のアドレスは、プログラムカウンタ
３２０に第１命令のサイズ３１６を加算器３２３で加算
することによって求められる。第１．第２命令の主メモ
リ上のアドレスは、各命令が実行ユニット用命令バッフ
ァ３４０に格納される時に、付加情報として同時に格納
さ／れる。プログラムカウンタ３２０の更新は、命令の
分岐が無い限り命令切出し部３１５より送出される第１
．第２命令のサイズの和３１８を、加算器３２１により
加算することによって新たな値が求められる。分岐命令
によってプログラムの流れが変わった場合には、セレク
タ３２２により、分岐予測バッファ３３０からの予測ア
ドレス、あるいは、命令実行ユニットからの分岐先アド
レス３２５を選択してプログラムカウンタ３２０にセッ
トする。On the other hand, the program counter 320
It holds the address on the main memory of the first instruction extracted in . The address of the second instruction is obtained by adding the size 316 of the first instruction to the program counter 320 using an adder 323 . 1st. The address of the second instruction on the main memory is simultaneously stored as additional information when each instruction is stored in the execution unit instruction buffer 340. The program counter 320 is updated by the first
．． A new value is obtained by adding the sum 318 of the sizes of the second instructions using an adder 321. When the flow of the program is changed by a branch instruction, the selector 322 selects the predicted address from the branch prediction buffer 330 or the branch destination address 325 from the instruction execution unit and sets it in the program counter 320.

次に分岐予測バッファ３３０に関して説明する。Next, the branch prediction buffer 330 will be explained.

分岐予測バッファ３３０に格納される情報は、以下に示
す５項目である。The information stored in the branch prediction buffer 330 is the following five items.

１）有効ビット３３１・・・各エントリーが有効である
ことを示す。1) Valid bit 331: Indicates that each entry is valid.

２）比較用アドレスタグ３３２・・・分岐命令のアドレ
スの一部を保持し、これを外部からのアドレスと比較器
３３６で比較することにより、該当する分岐命令が、分
岐予測バッファ３３０内に存在するかどうかの検証が行
なわれる。2) Comparison address tag 332: Holds a part of the address of a branch instruction, and compares this with an external address in the comparator 336, thereby determining whether the corresponding branch instruction exists in the branch prediction buffer 330. Verification will be conducted to see if this is possible.

３）分岐予測ビット３３３・・・該当する分岐命令が、
条件分岐命令であった場合１分岐するか否かの情報を与
える。3) Branch prediction bit 333...The corresponding branch instruction is
If it is a conditional branch instruction, it gives information on whether to take one branch or not.

４）分岐先命令アト麓ス３３４・・・該当する分岐命令
が、分岐すると予測される分岐先命令のアドレス。4) Branch destination instruction address 334: Address of the branch destination instruction from which the corresponding branch instruction is predicted to branch.

５）分岐先命令３３５・・・該当する分岐命令が、分岐
すると予測される分岐先の命令そのもの。5) Branch destination instruction 335: The branch destination instruction itself where the corresponding branch instruction is predicted to branch.

分岐予測バッファ３３０は、分岐命令が実行された時の
履歴を記憶しておき、再度同一の分岐命令が表われた時
に、その分岐先を予測するものである。その動作を以下
に示す。命令切出し部３１５において、切出された命令
が分岐命令であった場合、それが、第１命令か第２命令
かに基づいてセレクタ３２４を制御し、該分岐命令のア
ドレスを分岐予測バッファ３３０に送る０分岐予測バッ
ファでは、送られてきたアドレスで、分岐予測バッファ
の特定のエントリを選択するとともに、比較用アドレス
タグ３３２を比較器３３６で比較することにより、該当
する分岐命令が登録されている／か検証する。該分岐命
令が登録されており、且つ、分岐予測ビット３３３が、
分岐を示していれば。The branch prediction buffer 330 stores a history of executions of branch instructions, and predicts the branch destination when the same branch instruction appears again. Its operation is shown below. If the extracted instruction is a branch instruction, the instruction extraction unit 315 controls the selector 324 based on whether it is the first instruction or the second instruction, and stores the address of the branch instruction in the branch prediction buffer 330. In the sent 0 branch prediction buffer, the corresponding branch instruction is registered by selecting a specific entry in the branch prediction buffer using the sent address and comparing the comparison address tag 332 with the comparator 336. / Verify. The branch instruction is registered, and the branch prediction bit 333 is
If it shows a branch.

分岐先アドレス３３４を、プログラムカウンタ３２０及
び、フェッチポインタ３００にセットする。この時、命
令バッファ３１０は全てクリアされる。次に、分岐先命
令３３５をセレクタ３０３を介して命令バッファ３１０
に格納する。一方。A branch destination address 334 is set in the program counter 320 and the fetch pointer 300. At this time, the instruction buffer 310 is completely cleared. Next, the branch destination instruction 335 is transferred to the instruction buffer 310 via the selector 303.
Store in. on the other hand.

分岐予測ビット３３３が分岐しないことを示していれば
、ノーオペレーションとする。If the branch prediction bit 333 indicates that the branch will not be taken, a no-operation is assumed.

次に第５図を用いて、命令実行ユニット２１０の詳細を
説明する。第５図の例では、２命令を同時に実行する構
成を示しているが、２命令以上を同時に実行する構成も
容易に実現できる。実行ユニット用命令バッファ３４０
より２つの命令が同時に読出され、デコーダ４００，４
０１により命令の種類及び、オペランドの種類等が識別
される。Next, details of the instruction execution unit 210 will be explained using FIG. 5. Although the example in FIG. 5 shows a configuration in which two instructions are executed simultaneously, a configuration in which two or more instructions are executed simultaneously can also be easily realized. Execution unit instruction buffer 340
Two instructions are read out at the same time, and the decoders 400 and 4
01 identifies the instruction type, operand type, etc.

ここで、デコーダ４００において、先に実行されるべき
命令（第１命令）がデコードされ、デコーダ４０１にお
いて後に実行されるべき命令（第２命令）がデコードさ
れるものとする。これらの情報は、命令結合判定部４０
２に送られる。命令結合判定部では、命令の種類及びオ
ペランドの競合などが検証され、実行ユニット用命令バ
ッファ３４０より読出された２つの命令が結合可能かど
うかの判定を行なう。結合可能な命令の種類は。Here, it is assumed that the decoder 400 decodes an instruction to be executed first (first instruction), and the decoder 401 decodes an instruction to be executed later (second instruction). These pieces of information are stored in the instruction combination determination unit 40
Sent to 2. The instruction combination determination section verifies the type of instruction, conflict between operands, etc., and determines whether two instructions read from the execution unit instruction buffer 340 can be combined. What types of instructions can be combined?

第７図に示される。はとんどの命令の組が結合可能であ
るが、ビットフィールド命令、十進演算命令等は、他の
命令と結合できない。又１分岐命令どうし、サブルーチ
ンリンク命令どうしも結合できない。又、一方の命令の
デスティネーションオペランドが、他方の命令のソース
オペランドとなっている場合には、該２つの命令は結合
できない。It is shown in FIG. Most sets of instructions can be combined, but bit field instructions, decimal operation instructions, etc. cannot be combined with other instructions. Furthermore, branch instructions and subroutine link instructions cannot be combined. Furthermore, if the destination operand of one instruction is the source operand of the other instruction, the two instructions cannot be combined.

命令結合判定部４０２において、結合可能と判定された
命令の組は、以降のアドレス計算、オペランドフェッチ
、演算の各パイプラインステージで同期して実行される
。結合不可と判定された場合には、第１命令のみが、以
降のステージに渡され、残った命令は次の第１命令とな
り、更にその次の命令とともにデコードされ結合判定が
行なわれる。A set of instructions determined to be combinable by the instruction combination determination unit 402 is synchronously executed at each subsequent pipeline stage of address calculation, operand fetch, and operation. If it is determined that combination is not possible, only the first instruction is passed to the subsequent stage, and the remaining instruction becomes the next first instruction, which is further decoded together with the next instruction to determine combination.

結合された各命令のデコード結果は、第１命令が、レジ
スタ群４１０−４１４に、第２命令がレジスタ群４１５
−４１９にセットされる。レジスタオペランドアドレス
レジスタ４１０，４１１は、それぞれ、第１命令のソー
ス、デスティネーションオペランドのレジスタアドレス
が格納される。The decoding result of each combined instruction is that the first instruction is stored in the register group 410-414, and the second instruction is stored in the register group 415.
-419. Register operand address registers 410 and 411 store the register addresses of the source and destination operands of the first instruction, respectively.

第２命令に関しては、４１８，４１９のレジスタが同等
の機能を持つ。又、第１命令がメモリオペランドを含む
場合には、ペースレジスタのレジスタアドレスが４１４
に、インデクスレジスタのレジスタアドレスが４１３に
、デスプレースメント情報が４１２に格納される。第２
命令に関しては、レジスタ４１５，４１６，４１７が同
等の機能を有する。Regarding the second instruction, registers 418 and 419 have equivalent functions. Also, if the first instruction includes a memory operand, the register address of the pace register is 414.
The register address of the index register is stored in 413 and the displacement information is stored in 412. Second
Regarding instructions, registers 415, 416, and 417 have equivalent functions.

次にアドレス計算ステージの処理について説明する。第
１命令が、メモリオペランドを含む場合、その論理アド
レスを計算する必要がある。メモリオペランドのアドレ
スは、レジスタ４１４によって指定されるアト１ノ入用
レジスタフアイル４２０内のペースレジスタの内容と、
レジスタ４１３によって指定されるアドレス用レジスタ
ファイル４２０内のインデクスレジスタの内容と、ディ
スプレースメント情報４１２を、加算器４２１で加算す
ることによって求められ、論理アドレスレジスタ４２５
に格納される。第２命令に関しても同様な処理が行なわ
れ、論理アドレスが、レジスタ４２６に格納される。こ
こで、アドレス用レジスタファイル４２０は、第１命令
、第２命令により共有されており、複数の読出しポート
を持つことにより、第１．第２命令で同時にアドレス計
算を行なうことが可能となっている。Next, the processing of the address calculation stage will be explained. If the first instruction includes a memory operand, its logical address must be calculated. The address of the memory operand is determined by the contents of the pace register in the register file 420 specified by register 414;
It is obtained by adding the contents of the index register in the address register file 420 specified by the register 413 and the displacement information 412 in the adder 421, and the logical address register 425
is stored in Similar processing is performed for the second instruction, and the logical address is stored in register 426. Here, the address register file 420 is shared by the first instruction and the second instruction, and has a plurality of read ports. Address calculation can be performed simultaneously with the second instruction.

次に、メモリオペランドの読出しステージについて説明
する。第１オペランドが、メモリオペランドを含む場合
、アドレス計算ステージによって得られた論理アドレス
４２５によってメモリアクセスを行なう。論理アドレス
４２５は、オペランド用アドレス変換バッファ４３０に
より、主メモリ上の物理アドレスに変換される。該物理
アドレスによりオペランド用キャッシュメモリ４３１が
アクセスされ、読出されたメモリオペランドがし″ジス
タ４３４に格納される。第２命令についても同様にして
レジスタ４３５にメモリオペランドが格納される。オペ
ランド用アドレス変換バッファ４３０、及びオペランド
用キャッシュメモリ４３１は、第１．第２命令によって
共有されており、複数の読出しポートを持つことにより
、第１．第２命令のメモリオペランド読出しを同時に実
行できる。ここで注意すべきこととしては、例えば、第
１命令でキャッシュミスヒツトが発生し１次のステージ
に移れない場合には、第２命令も同様に次のステージに
移れない。Next, the memory operand read stage will be explained. If the first operand includes a memory operand, memory access is performed using the logical address 425 obtained by the address calculation stage. The logical address 425 is translated into a physical address on main memory by the operand address translation buffer 430. The operand cache memory 431 is accessed by the physical address, and the read memory operand is stored in the register 434.The memory operand for the second instruction is similarly stored in the register 435. Operand address conversion The buffer 430 and the operand cache memory 431 are shared by the first and second instructions, and by having multiple read ports, it is possible to simultaneously read the memory operands of the first and second instructions. What should be done is, for example, if a cache miss occurs in the first instruction and the process cannot move to the primary stage, the second instruction cannot also move to the next stage.

次にオペランドフェッチステージの構成について説明す
る。第１命令においてオペランドがレジスタである場合
には、レジスタオペランドアドレスレジスタ４３２，４
３３の情報に従って、テータ用レジスタファイル４４０
より、オペランドを読出す。一方、メモリオペランドで
あれば、レジスタ４３４よりアライナ４４１を経て、オ
ペランドを得る。Next, the configuration of the operand fetch stage will be explained. If the operand in the first instruction is a register, the register operand address register 432, 4
According to the information in 33, data register file 440
Read the operand. On the other hand, if it is a memory operand, the operand is obtained from the register 434 through the aligner 441.

又、同一のパイプラインにおいて１つ前に実行された命
令の結果をソースオペランドとする場合には、バイブ内
バイパスルート４６０によりオペランドを得る。一方、
他方のパイプラインにおいて、１つ前に実行された命令
の結果をソースオペランドとする場合には、パイプ間バ
イパスルート４６１によりオペランドを得る。以上の処
理は、第２命令においても同様に行なわれる。Further, when the result of the previous instruction executed in the same pipeline is used as the source operand, the operand is obtained by the intra-vibe bypass route 460. on the other hand,
In the other pipeline, when the result of the previously executed instruction is to be used as a source operand, the operand is obtained through the inter-pipe bypass route 461. The above processing is similarly performed for the second instruction.

演算器４５４，４５５においては、オペランドフェッチ
ステージにより得られたオペランドに対して演算が実行
され、その結果が、レジスタ４５６゜４５７に格納され
る。その後、演算結果は、アドレス用レジスタファイル
４２０．データ用レジスタファイル４４０、あるいは、
オペランド用キャッシュメモリ４３１に格納される。演
算器４５４゜４５５における演算結果の状態（ＺＥＲＯ
，ｏνｅｒＦｌｏｔ＋＋等）４６２，４６３は、状態コ
ード生成回路４５８に転送され、状態レジスタ４５９に
反映される。In the arithmetic units 454 and 455, arithmetic operations are performed on the operands obtained by the operand fetch stage, and the results are stored in registers 456 and 457. Thereafter, the calculation results are stored in the address register file 420. Data register file 440, or
It is stored in the operand cache memory 431. The state of the calculation results in the calculation units 454 and 455 (ZERO
, overFlot++, etc.) 462 and 463 are transferred to the status code generation circuit 458 and reflected in the status register 459.

次に、第６図を用いて、状態コード生成回路について詳
細に説明する。状態コード生成回路４５８は、２つの機
能を持つ。１つは、第１命令の演算結果の状態と、第２
命令の演算結果の状態を、命令の順序を考慮してマージ
し、状態レジスタに反映する機能。第２の機能は、条件
分岐命令と、条件を生成する命令を同時処理するための
条件判定機能である。まず第１の機能については、第１
命令の演算器４５４より出力される演算結果の状態４６
２、及び第２命令の演算器４５５より出力される状態４
６３を状態生成部９１８に入力する。Next, the status code generation circuit will be explained in detail using FIG. Status code generation circuit 458 has two functions. One is the state of the operation result of the first instruction, and the second
A function that merges the states of instruction operation results, taking into account the order of the instructions, and reflects them in the state register. The second function is a condition determination function for simultaneously processing conditional branch instructions and instructions that generate conditions. First of all, regarding the first function,
State 46 of the operation result output from the instruction arithmetic unit 454
2, and state 4 output from the arithmetic unit 455 of the second instruction.
63 is input to the state generation unit 918.

状態生成部９１８では、第１命令が、第２命令よりも先
に実行されるべき命令であることを考慮し、第１命令か
らの状態４６２の上に第２命令からの状態を反映した後
、状態レジスタ４５９に格納する。The state generation unit 918 takes into consideration that the first instruction is an instruction that should be executed before the second instruction, and after reflecting the state from the second instruction on the state 462 from the first instruction, , stored in the status register 459.

次に、第２の機能について説明する。条件分岐命令が、
第２命令のパイプラインで実行されると仮定すると１分
岐条件判定情報がレジスタ９０４に、又、分岐予測バッ
ファによる予測結果が、レジスタ９０５に格納される。Next, the second function will be explained. The conditional branch instruction is
Assuming that the instruction is executed in the pipeline of the second instruction, 1-branch condition determination information is stored in the register 904, and the prediction result by the branch prediction buffer is stored in the register 905.

今、分岐条件を生成する命令と条件分岐命令が逐次的に
実行されたとすると、条件分岐命令を実行する時点では
、分岐条件は既に状態レジスタに反映されている。従っ
てセレクタ９１４で状態レジスタ４５９を選択し、分岐
判定回路９１５に入力する。分岐判定回路９１５は、状
態レジスタ４５９と１分岐条件判定情報９０４より分岐
するか否かを決定する。その結果と、分岐予測結果９０
５を比較器９１６で比較し、一致すれば、ノーオペレー
ション、不一致であれば、全てのパイプラインをキャン
セルして。Assuming that an instruction that generates a branch condition and a conditional branch instruction are executed sequentially, the branch condition is already reflected in the status register by the time the conditional branch instruction is executed. Therefore, the selector 914 selects the status register 459 and inputs it to the branch determination circuit 915. The branch determination circuit 915 determines whether to branch based on the status register 459 and the 1-branch condition determination information 904. The result and branch prediction result 90
5 are compared by a comparator 916, and if they match, no operation is performed, and if they do not match, all pipelines are canceled.

正しい方向に分岐する。一方、分岐条件を生成する命令
と条件分岐命令を同時に実行する場合を考える。分岐条
件生成命令を第１命令、条件分岐命令を第２命令とする
。この時１条件分岐命令を実行する時点では、分岐条件
は、状態レジスタ４５９に反映されていない。従って、
この場合には、セレクタ９１４が、第１命令の演算結果
の状態４６２を選択し、分岐判定回路９１５に入力する
０分岐判定回路９１５は、第１命令の演算結果の状態４
６２と、分岐条件判定情報９０４より分岐するか否かを
決定する。その結果と１分岐予測結果９０５と比較器９
１６で比較し、一致すればノーオペレーション、不一致
であれば、全てのパイプラインをキャンセルして正しい
方向に分岐する。Branch out in the right direction. On the other hand, consider a case where an instruction that generates a branch condition and a conditional branch instruction are executed simultaneously. The branch condition generation instruction is assumed to be a first instruction, and the conditional branch instruction is assumed to be a second instruction. At this time, the branch condition is not reflected in the status register 459 at the time when the one-conditional branch instruction is executed. Therefore,
In this case, the selector 914 selects the state 462 of the operation result of the first instruction, and the 0 branch determination circuit 915 inputs the state 462 of the operation result of the first instruction to the branch determination circuit 915.
62 and branch condition determination information 904 to determine whether or not to branch. The result, 1 branch prediction result 905 and comparator 9
16, and if they match, no operation is performed, and if they do not match, all pipelines are canceled and branched in the correct direction.

以上の様な状態コード生成回路４５８の機能により、状
態レジスタ４５９が、命令の順序に従って更新されるこ
とを保証し、又２条件分岐命令と、分岐条件を生成する
命令を同時に実行することが可能となる。The functions of the status code generation circuit 458 as described above ensure that the status register 459 is updated in accordance with the order of instructions, and it is possible to simultaneously execute a two-conditional branch instruction and an instruction that generates a branch condition. becomes.

次に、第１図及び、第８．第９図を用いて、パイプライ
ン動作を説明する。第１図は、２命令の同時処理を実現
する本実施例のパイプライン構成を示したものである。Next, FIGS. 1 and 8. Pipeline operation will be explained using FIG. 9. FIG. 1 shows the pipeline configuration of this embodiment for realizing simultaneous processing of two instructions.

命令フェッチステージ５００は、命令用キャッシュメモ
リ５２０から、複数の命令を同時に読出す。プリデコー
ドステージ５０１は同時に複数の命令の切出しを行ない
、分岐命令があれば、分岐予測バッファ５２１をアクセ
スして、分岐方向を決定する。命令バッファステージ５
０２では、実行ユニット用命令バッファからの命令の読
出しを行なう、デコード・アンド・コンバインステージ
５０３では、２命令を同時にデコードし、その結果に基
づいて、結合可能がどうかの判定を行なう、アドレス計
算ステージ５０４゜５１１はメモリオペランドの論理ア
ドレスを計算する。アドレス変換ステージ５０６，５１
２では、メモリオペランドの論理アドレスに変換する。The instruction fetch stage 500 simultaneously reads multiple instructions from the instruction cache memory 520. The predecode stage 501 simultaneously extracts a plurality of instructions, and if there is a branch instruction, accesses the branch prediction buffer 521 to determine the branch direction. Instruction buffer stage 5
In step 02, instructions are read from the execution unit instruction buffer. In the decode and combine stage 503, two instructions are decoded at the same time, and based on the result, it is determined whether or not they can be combined. 504.511 calculates the logical address of the memory operand. Address translation stage 506, 51
In step 2, the memory operand is converted into a logical address.

オペランドフェッチステージ５０８，５１５では、オペ
ランド用キャッシュメモリ５２３、あるいは、レジスタ
ファイル５２２よりオペランドを読出す。In the operand fetch stages 508 and 515, operands are read from the operand cache memory 523 or the register file 522.

演算ステージ５０９，５１６では、読出したオペランド
に対して演算を行なう。ライトステージ５１０．５１７
では、演算した結果を、オペランド用キャッシュメモリ
５２３、あるいはレジスタファイル５２２に格納する。In calculation stages 509 and 516, calculations are performed on the read operands. light stage 510.517
Then, the result of the operation is stored in the operand cache memory 523 or the register file 522.

アドレス計算ステージ以降は、同一論理の２本のパイプ
ラインによって構成さている。デコード・アンド・コン
バインステージ５０３において結合された命令はこの２
本のパイプラインで同期して実行される。The stages after the address calculation stage are composed of two pipelines with the same logic. The instructions combined in the decode and combine stage 503 are these two instructions.
Executed synchronously in the book pipeline.

第８図は、２命令の同時処理が、効率良く実行されてい
る場合のパイプラインステージフローを示している。こ
の中で、３番目、４番目の命令の同時実行は、前述した
様に、パイプライン間で、演算結果の状庵を転送するこ
とによって実現される。又、９番目と１ｏ番目の命令の
並列実行は、９番目のサブルーチンジャンプ命令に関す
る分岐予測が成功することによって実現される。FIG. 8 shows a pipeline stage flow when simultaneous processing of two instructions is executed efficiently. Among these, the simultaneous execution of the third and fourth instructions is realized by transferring the state of the operation result between the pipelines, as described above. Further, parallel execution of the 9th and 10th instructions is realized by successful branch prediction regarding the 9th subroutine jump instruction.

第９図もまた。２命令間時処理方式におけるパイプライ
ンステージフローを示している。３番目と４番目の命令
は、ｄ７のレジスタが競合したために命令の結合ができ
なかった例を示している。Figure 9 also. The pipeline stage flow in the two-instruction time processing method is shown. The third and fourth instructions show an example in which instructions could not be combined due to register conflict in d7.

この場合には、３番目の命令のみが単独で実行され、４
番目の命令は、５番目の命令と結合され実行されている
。又、８番目と９番目の命令は、結合には成功したが、
８番目の命令と、７番目の命令の間で、ａＯレジスタが
競合し、８番目の命令が、待たされた場合を示している
。この時、８番目の命令と結合に成功した９番目の命令
も待たされてしまう。このパイプライン間の同期によっ
て、命令の逐次性が維持される。In this case, only the third instruction is executed alone, and the fourth
The th instruction is combined with the 5th instruction and executed. Also, the 8th and 9th instructions were successfully combined, but
This shows a case where there is a contention for the aO register between the 8th instruction and the 7th instruction, and the 8th instruction is forced to wait. At this time, the ninth instruction that has been successfully combined with the eighth instruction is also forced to wait. This synchronization between pipelines maintains the sequentiality of instructions.

〔Effect of the invention〕

本発明によれば、命令の逐次性を維持して複数命令を同
時に実行できるので、汎用レジスタファイルの排他制御
を簡略化でき高性能化が可能となる。又、唯一の状態レ
ジスタが、命令の順序に従って更新されることを保証で
きるので、演算処理装置の状態の管理が容易になる。According to the present invention, a plurality of instructions can be executed simultaneously while maintaining instruction sequentiality, thereby simplifying exclusive control of a general-purpose register file and improving performance. Also, since it can be ensured that only one status register is updated according to the order of instructions, management of the status of the processing unit is facilitated.

[Brief explanation of the drawing]

第１図は、本発明の一実施例のパイプライン構成図、第
２図は、本発明が適用される計算機システムの構成図、
第３図は、第２図レコおける演算処理装置の内部構成、
第４図は、第３図における命令フェッチユニットの内部
構成、第５図は、第３図における命令実行ユニットの内
部構成、第６図は、第５図における状態コード生成回路
の内部構成、第７図は、結合可能な命令の組合せ図、第
８図、第９図は、パイプラインステージフローの一例で
ある。５００・・・命令フェッチステージ、５０３・・・デコ
ード・アンド・コンバインステージ、５０４，５１１・
・・アドレス計算ステージ、５０８，５１５・・・オペ
ランドフェッチステージ、５０９，５１６・・・演算ス
テージ、５２２・・・マルチポートレジスタファイル、
５２３・・・オペランド用キャッシュ。第　　１図第図第図第図第図FIG. 1 is a pipeline configuration diagram of an embodiment of the present invention, FIG. 2 is a configuration diagram of a computer system to which the present invention is applied,
Figure 3 shows the internal configuration of the arithmetic processing unit in Figure 2.
4 shows the internal configuration of the instruction fetch unit in FIG. 3, FIG. 5 shows the internal configuration of the instruction execution unit in FIG. 3, and FIG. 6 shows the internal configuration of the status code generation circuit in FIG. FIG. 7 is a combination diagram of instructions that can be combined, and FIGS. 8 and 9 are examples of pipeline stage flows. 500... Instruction fetch stage, 503... Decode and combine stage, 504, 511...
...Address calculation stage, 508,515... Operand fetch stage, 509,516... Arithmetic stage, 522... Multiport register file,
523... Operand cache. Figure 1

Claims

[Claims] 1. From a main memory storing instructions and operands,
An instruction reading device that reads multiple instructions simultaneously; a plurality of decoding devices that decode the read instructions and identify the type of instruction and the type of operand; , a multistage pipeline computer comprising a plurality of operand reading devices that read from the main storage device or the general-purpose register file, and a plurality of arithmetic units that perform operations on the read operands according to the type of instruction. The decoding device has decoding means for decoding multiple instructions simultaneously, identification means for identifying whether or not the decoded multiple instructions can be executed in parallel, and means for combining instructions that can be executed in parallel. A multiple instruction simultaneous processing method characterized in that instructions are pipeline-processed synchronously between multiple operand reading means and multiple arithmetic units. 2. The multiple instruction simultaneous processing system according to claim 1 has at least one status register that indicates the status of the operation result, and the status of the operation result of multiple instructions that are executed in combination is determined according to the order of the instructions. A method for simultaneously processing multiple instructions, which is characterized in that the information is reflected in the status register. 3. The multiple instruction simultaneous processing method according to claim 1 includes: at least one status register indicating the status of the operation result; means for transferring the status of the operation result between the plurality of pipelines; and means for transferring the status of the operation result from other pipelines. A plurality of instructions characterized in that the state of the operation result that has been performed and the contents of the state register are selected, and it is determined whether or not to execute a conditional branch instruction based on the selected result. Simultaneous processing method. 4. The multiple instruction simultaneous processing method according to claim 1 has means for transferring operation results between a plurality of pipelines, and the transferred operation results are used to perform operations on other instructions. A method for simultaneously processing multiple instructions. 5. A method for simultaneously processing multiple instructions according to claim 1, wherein the multiple instructions are processed by a plurality of pipeline processing devices each comprising hardware of the same logic. 6. The multiple instruction simultaneous processing method according to claim 1 includes a general-purpose register file having multiple read/write ports and a cache memory having multiple read/write ports, which are shared by multiple pipeline processing devices. has
A method for simultaneously processing multiple instructions, which is characterized by simultaneous processing. 7. An instruction reading device that simultaneously reads a plurality of variable-length instructions from a main memory that stores variable-length instructions and operands, and a plurality of instruction reading devices that decode the read instructions and identify the type of instruction and the type of operand. In a multi-stage pipeline computer comprising a decoding device, a plurality of operand reading devices that read necessary operands based on the decoding results of the decoding device, and a plurality of arithmetic units that perform operations on the operands according to the type of instruction. , the decoding device has a plurality of decoders having a bit width equal to the minimum unit of a variable length instruction, and the decoder simultaneously decodes each of the minimum units of the plural instructions read from the main memory. A multiple instruction simultaneous processing method is characterized in that the beginning of each instruction is identified based on the decoded result, and multiple instructions are decoded simultaneously.