JP3757768B2

JP3757768B2 - Issuing control system for scalar memory access instruction during vector memory access

Info

Publication number: JP3757768B2
Application number: JP2000251732A
Authority: JP
Inventors: 篤山代屋
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2000-08-23
Filing date: 2000-08-23
Publication date: 2006-03-22
Anticipated expiration: 2020-08-23
Also published as: JP2002063154A

Description

【０００１】
【発明の属する技術分野】
本発明は、スカラプロセッサとベクトルプロセッサとを備えるコンピュータシステムにおいて、ＶＳＴ命令（ベクトルプロセッサからのベクトル方式のストア命令）の実行後に、このＶＳＴ命令に対するフラッシュ（無効化）処理を行っている間に、後続のＬＤ命令（スカラメモリロード命令）が実行された場合に、キャッシュ（ＣＡＣＨＥ。キャッシュメモリ）とメモリ（主記憶）との同期（一致）を保証するためのベクトルメモリアクセス時スカラメモリアクセス命令発行制御方式に関する。
【０００２】
【従来の技術】
従来より、２つ以上の異なるプロセッサから１つのメモリ（例えば、共有メモリ）に対してアクセスが行われる場合に、自プロセッサ内のキャッシュのコヒーレンシを保つために、スヌーピング等の方法が用いられている。これにより、キャッシュ登録データに対応するメモリ側データが他プロセッサにより変更されてしまった場合に、その部分のキャッシュの無効化が行われていた。
【０００３】
スカラプロセッサとベクトルプロセッサとの関係についても同様で、ベクトルプロセッサからメモリにデータを書き込むことにより、スカラプロセッサ内のキャッシュに登録されているデータに対応するメモリ側のデータの領域を書き換える場合が生じると、スカラプロセッサ内のキャッシュの対応する部分を無効化（フラッシュ）する必要がある。
【０００４】
この時、性能的に問題になるのが無効化する領域の算出と、無効化処理時間（フラッシュ処理時間）とである。領域を正確に算出すると余分な部分をフラッシュすることなく、キャッシュのエントリを無駄に捨てることもなく、性能を落とさずにすむ。しかし、領域を正確に算出しようとすると、その分、フラッシュ処理時間が大きくなってしまう。
【０００５】
さて、本発明は、スカラプロセッサとベクトルプロセッサとを持つコンピュータシステム（図２および図３参照）において、スカラプロセッサ内に存在するキャッシュの制御に関するものである。ここで、まず、本発明で出てくる命令（ＬＤ命令およびＶＳＴ命令）について、以下に説明する。
【０００６】
ａ．ＬＤ命令とは、メモリの内容を読み出すためにスカラプロセッサが発行する命令である。ＬＤ命令が発行されると、キャッシュブロック（図１等参照）において、そのＬＤ命令のアクセス対象となるデータがキャッシュに登録されているかどうかの判定が行われる。もし、登録されている場合には、「キャッシュＨＩＴ」として、キャッシュから当該ＬＤ命令に対するデータが読み出される。一方、登録されていない場合には、「キャッシュＭＩＳＳ」として、当該ＬＤ命令については発行制御回路（図１参照）を介し、メモリに対してデータ取得要求（ミスリクエスト）が発行される。したがって、この場合には、ＬＤ命令はメモリからデータを読み出すことになる。
【０００７】
ｂ．ＶＳＴ命令とは、ベクトルプロセッサ内に存在するベクトルレジスタの内容をメモリにブロック単位で書き込む命令である。このため、ＶＳＴ命令には、メモリ上の書き込み開始アドレス（スタートアドレス：Ｓ），スタートアドレスからの書き込み間隔（ディスタンス：Ｄ），および書き込み回数（ベクトルレングス長：ＶＬ）の情報が入っている。ベクトルレジスタの内容は、メモリに対して、そのメモリ上のアドレス「Ｓ」から書き込みが開始され、アドレス「Ｓ＋Ｄ×（ＶＬ−１）」まで書き込みが行われることになる。
【０００８】
ここで、図３を参照して、スカラプロセッサ，ベクトルプロセッサ，およびメモリを含んで構成されるコンピュータシステムにおけるＬＤ命令およびＶＳＴ命令の動作を説明する。
【０００９】
スカラプロセッサ内に存在するキャッシュは、メモリの内容の一部をコピーしたものである。ここで、キャッシュにメモリ内のデータが登録された後に、ＶＳＴ命令が実行された場合には、そのＳ，Ｄ，ＶＬの内容により、図３中の斜線で示しているキャッシュにすでに登録されている部分をＶＳＴ命令により書き込んでしまう場合も発生する。
【００１０】
この場合に、ＶＳＴ命令は、直接ベクトルプロセッサからメモリにデータの書き込みを行っており、スカラプロセッサ内のキャッシュにはデータの書き込みは行われない。このため、キャッシュでは、すでに登録されている（図３中の斜線部分の）データの無効化（フラッシュ）処理を行う必要がある。
【００１１】
ＶＳＴ命令の後続のＬＤ命令が、このＶＳＴ命令で書き込まれた領域を読み出す場合に、キャッシュに登録されていない部分を読み出すのであれば、ＬＤ命令はキャッシュＭＩＳＳとなりメモリに対してデータを取りに行く。この時、メモリ側では、ＶＳＴ命令によるメモリへの書き込みが終了した後に、ＬＤ命令が実行される。このため、ＬＤ命令は、ＶＳＴ命令により書き換わった後の（正しい）値のデータを読み出すことができる。
【００１２】
これに対して、ＶＳＴ命令の後続のＬＤ命令が、図３中の斜線の部分、すなわちキャッシュにすでに登録された部分をＶＳＴ命令が書き換えた領域を読み出す場合には、このＬＤ命令はキャッシュＨＩＴとなりキャッシュからデータを読み出そうとする。しかし、メモリ側では、その部分はＶＳＴ命令により書き換えられてしまうため、そのままキャッシュから読み出そうとすると不正なデータの読み出しを行ってしまうことになる。このため、ＶＳＴ命令の実行によるフラッシュ処理により、その部分のキャッシュ登録部分が無効化されるまで、ＬＤ命令の発行を待つ必要がある。
【００１３】
ちなみに、無効化後にＬＤ命令が実行されると、キャッシュが無効化されているため、キャッシュＭＩＳＳとなり、メモリに対してデータ取得要求が出され、メモリからＶＳＴ命令によって書き換えられた後のデータを読み出すことができる。
【００１４】
上記のように、従来の技術では、フラッシュ処理を行っている間は、キャッシュからデータを読み出すＬＤ命令（スカラＬＤ命令）を実行することができない。そして、このようにＬＤ命令の実行が止まることにより、プロセッサ自体が止まって（ＨＯＬＤして）しまう。
【００１５】
すなわち、従来においては、ＬＤ命令のアクセスアドレスが「先行のＶＳＴ命令によってベクトルレジスタの内容がブロック単位でメモリに書き込まれる領域」（以降、「バウンダリ領域」と呼ぶ）に当たっている場合には、そのＶＳＴ命令に対するフラッシュ処理が終了するまでそのＬＤ命令の発行が抑止されていた。
【００１６】
ここで、従来の技術では、フラッシュ処理速度をいかに上げるかに重点を置いて改善を行なうことにより、ＬＤ命令が止まる時間、つまりプロセッサが止まっている時間をできるだけ短縮するような技術改良が行われていた。
【００１７】
【発明が解決しようとする課題】
上述したように、従来のベクトルメモリアクセス時スカラメモリアクセス命令発行制御方式では、ＶＳＴ命令の実行に起因するフラッシュ処理が行われている間には後続のＬＤ命令を実行することができず、ＬＤ命令が止まることにより、プロセッサ自体が止まって（ＨＯＬＤして）しまうという問題点があった。
【００１８】
また、先に述べたように、従来は、フラッシュ処理速度をいかに上げるかに重点を置いて、上記の問題点に対する改善を行なうことにより、ＬＤ命令が止まる時間、つまりプロセッサが止まっている時間をできるだけ短縮するような技術改良が試みられていた。
【００１９】
本発明では、視点を変え、ＶＳＴ命令発行によるフラッシュ処理中において、当該ＶＳＴ命令のバウンダリ領域内にアクセス対象のアドレスが入っている後続のＬＤ命令は全てキャッシュＭＩＳＳとしてメモリ側にミスリクエストを出すようにすることにより、ＬＤ命令を止めずに、さらにその後続の命令を実行することも可能にし、スカラプロセッサとベクトルプロセッサとを備えるコンピュータシステムにおけるプロセッサの命令処理速度の高速化（プロセッサの性能向上）を可能とする。
【００２０】
すなわち、本発明の目的は、上述の点に鑑み、後続のＬＤ命令のアクセスアドレスがＶＳＴ命令のバウンダリ領域に入っている場合に、そのＬＤ命令がキャッシュに対してＭＩＳＳであれば、そのまま発行を行い、またそのＬＤ命令がキャッシュに対してＨＩＴであれば、キャッシュからデータを取り出さず、キャッシュＭＩＳＳに変更して、ミスリクエストを出し、メモリに対してデータを取りに行くことができるベクトルメモリアクセス時スカラメモリアクセス命令発行制御方式を提供することにある。
【００２１】
なお、本発明のベクトルメモリアクセス時スカラメモリアクセス命令発行制御方式に対する従来技術に関する特許公報としては、「特開昭６１−２９６４７２号公報」，「特開平０１−２２２３７５号公報」，および「特公平０３−０５３６６７号公報」が存在する。
【００２２】
しかし、上記の各公報に記載された技術と本発明とは、基本的な構成を異にしている。すなわち、本発明は、上記の各公報に記載された技術よりも簡易な構成（後述するような「判定回路」を利用した構成）によって、プロセッサの性能向上を実現している。
【００２３】
具体的には、以下のａ〜ｃに示すような差異がある。
【００２４】
ａ．特開昭６１−２９６４７２号公報に記載された技術（緩衝記憶装置）は、「タグ記憶手段に記憶されている主記憶ブロックアドレス情報に対応して、領域比較回路から出力される領域内アクセス信号を無効化する領域内アクセス信号無効化情報を記憶するための領域内アクセス信号無効化手段」を使用する必要がある。これに対して、本発明はこのような領域内アクセス信号無効化手段に該当する手段を使用する必要のない構成を有している。
【００２５】
ｂ．特開平０１−２２２３７５号公報に記載された技術（緩衝記憶装置）は、「領域チェック手段において領域一致と判定された主記憶アドレス情報の一部を無効化処理期間中、主記憶アドレス情報の一部の有効を表示するＶビットとともに保持し、スカラロード要求に後続するスカラロード要求の主記憶アドレスの一部と比較し、一致がとれた場合にアドレス一致信号を出力し、領域チェック手段からの領域一致信号を抑止するアドレスチェック手段」を使用する必要がある。これに対して、本発明はこのようなアドレスチェック手段に該当する手段を使用する必要のない構成を有している。
【００２６】
ｃ．特公平０３−０５３６６７号公報に記載された技術（情報処理装置）は、「指令回路からのスカラデータロード指令に応答して領域検出回路から領域内検出信号が出力されると、スカラデータロード指令をバッファメモリ回路およびタグ記憶回路をバイパスして主記憶装置に直接送るよう制御するキャッシュ制御回路」というバイパス回路を設けている。これに対して、本発明は、判定回路というはるかに簡単な回路（低コストな回路）で上記のバイパス回路と同等な性能を実現することが可能になる。
【００２７】
なお、特開昭６１−２９６４７２号公報および特開平０１−２２２３７５号公報には、「ＭＩＳＳとなったリクエストがメモリにデータ要求を出している間に、後続の同一アドレスのＬＤリクエストが入ってくると、そのリクエストもメモリに出していたため動作が遅くなる」という当該公報記載の発明に対する従来技術の欠点（本来なら、同一のアドレスであるため、１回のキャッシュ登録だけで済むのに、後続の（余分な）登録データをすでに登録しているキャッシュのブロック上に上書きを重ねていくこととなり、無意味なキャッシュ登録時間が発生してしまうため、データ読み出しが遅れてしまうという欠点）が指摘されている。
【００２８】
これに対して、本発明では、後述する「発行チェック回路」の制御等によって、あるアクセスアドレスのＬＤ命令がメモリに対してリクエストとして送出されている場合に、後続の同一のアドレスにアクセスするＬＤ命令は出さない（待機させる）ような方式としている。すなわち、先行のＬＤリクエストが帰ってくると、そのデータを含むデータブロックがキャッシュに登録されるので、後続の同一アドレス（正確には、当該データブロック内のアドレス）のＬＤ命令は、登録されるまで発行を待たされる。したがって、データがキャッシュに登録された後に再実行されるので、無意味なＬＤ命令を発行せずにすみ（待たせた方が、余分なキャッシュ登録がない分早い時期に読み出せる）、上記公報で言及されているような性能の低下は起こらない。
【００２９】
【課題を解決するための手段】
本発明のベクトルメモリアクセス時スカラメモリアクセス命令発行制御方式は、スカラプロセッサ，ベクトルプロセッサ，および，メモリを備えるコンピュータシステムのスカラプロセッサにおいて、キャッシュおよびその制御回路からなり、キャッシュＨＩＴ／ＭＩＳＳ判定を行うキャッシュブロックと、プロセッサコア部分からの命令を受け付け、ＬＤ命令を受け付けた際に当該ＬＤ命令とアクセスアドレスを同一にする先行ミスリクエストの有無をチェックし、アクセスアドレスが同一の先行ミスリクエストが存在する場合には当該ＬＤ命令の発行を待機させる発行チェック回路と、ＶＳＴ命令のＳ，Ｄ，ＶＬに基づき、当該ＶＳＴ命令の書き込み範囲を算出し、その範囲内のデータがキャッシュにすでに登録されているか否かを判定し、当該判定で「登録されている」と判断した場合には前記キャッシュブロック上の当該登録部分に対するＶビットをリセットするように前記キャッシュブロックに対してフラッシュ要求を送出するフラッシュ処理回路と、ＶＳＴ命令のＳ，Ｄ，ＶＬに基づき算出した当該ＶＳＴ命令のバウンダリ領域の範囲を示す情報を保持し、そのバウンダリ領域の範囲内を後続のＬＤ命令がアクセスしているかどうかを判定するバウンダリチェックを前記フラッシュ処理回路における当該ＶＳＴ命令についてのフラッシュ処理が終わるまで実行するバウンダリチェック回路と、命令を発行する制御を行い、キャッシュＭＩＳＳとなったＬＤ命令をメモリ側に発行するためのデータ取得要求を出力する発行制御回路と、前記キャッシュブロックおよび前記バウンダリチェック回路からの信号に基づく判定で「ＶＳＴ命令の後続ＬＤ命令がキャッシュＨＩＴとなり、かつそのアクセスアドレスが当該ＶＳＴ命令のバウンダリ領域の範囲内に入っている」と判断した場合に、当該後続ＬＤ命令をキャッシュＨＩＴからキャッシュＭＩＳＳに変更するように制御し、当該制御のための指示を前記発行制御回路に出す判定回路とを有する。
【００３０】
ここで、上記のベクトルメモリアクセス時スカラメモリアクセス命令発行制御方式は、キャッシュブロックによって生成されたＨＩＴ信号に基づいて「キャッシュＨＩＴであるかキャッシュＭＩＳＳであるか」を認識し、バウンダリチェック回路によって生成されたバウンダリ内判定信号に基づいて「バウンダリ領域の範囲内であるか範囲外であるか」を認識し、「ＶＳＴ命令に後続するＬＤ命令のアクセスアドレスが当該ＶＳＴ命令のバウンダリ領域の範囲内であれば、当該ＬＤ命令がキャッシュにＨＩＴしても、キャッシュからデータを読み出さずにメモリに対してデータ取得要求を発行するような制御」を実現するために、当該制御の際にキャッシュＭＩＳＳの場合と同様にリクエスト発行指示を発行制御回路に対して出力する判定回路を備えるように構成することが、望ましい一態様として考えられる。
【００３１】
なお、本発明のベクトルメモリアクセス時スカラメモリアクセス命令発行制御方式は、より一般的には、キャッシュブロックを有するスカラプロセッサ，ベクトルプロセッサ，および，メモリを備えるコンピュータシステムにおいて、ＶＳＴ命令の実行に起因するフラッシュ処理を行うフラッシュ処理手段と、ＶＳＴ命令に後続するＬＤ命令のアクセスアドレスが当該ＶＳＴ命令についてのバウンダリ領域の範囲内に入っているか否かをチェック（バウンダリチェック）するバウンダリチェック手段と、ＶＳＴ命令に後続するＬＤ命令の発行時に、当該ＶＳＴ命令についての前記バウンダリチェック手段によるバウンダリチェックの判定結果と前記キャッシュブロックによるキャッシュＨＩＴ／ＭＩＳＳ判定の判定結果とに基づき、「後続ＬＤ命令がキャッシュＨＩＴとなり、かつそのアクセスアドレスがＶＳＴ命令のバウンダリ領域の範囲内に入っている」と判定した場合に、キャッシュＨＩＴからキャッシュＭＩＳＳに変更するように制御する判定手段とを有する構成であると表現することができる。
【００３２】
【発明の実施の形態】
次に、本発明について図面を参照して詳細に説明する。
【００３３】
図１は、本発明の実施の形態に係るベクトルメモリアクセス時スカラメモリアクセス命令発行制御方式の構成を示すブロック図である。
【００３４】
図１を参照すると、本実施の形態に係るベクトルメモリアクセス時スカラメモリアクセス命令発行制御方式は、プロセッサコア部分からのリクエスト（命令）を受け付ける発行チェック回路１１と、ＶＳＴ命令の実行に起因するフラッシュ処理を行うフラッシュ処理回路１２と、キャッシュおよびその制御回路からなるキャッシュブロック１３と、ＶＳＴ命令に後続するＬＤ命令が当該ＶＳＴ命令についてのバウンダリ領域の範囲内に入っているか否かをチェックするバウンダリチェック回路１４と、キャッシュブロック１３からのＨＩＴ信号およびバウンダリチェック回路１４からのバウンダリ内判定信号に基づき発行制御回路１６に対してリクエスト発行指示を出すべきか否かを判定する判定回路１５と、キャッシュＭＩＳＳとなったＬＤ命令をメモリ３０側に発行するデータ取得要求（ミスリクエスト）を出力する制御等の命令発行の制御を行う発行制御回路１６とを含んで構成されている。
【００３５】
図２は、図１に示すベクトルメモリアクセス時スカラメモリアクセス命令発行制御方式の各構成要素と本発明が適用されるコンピュータシステムのスカラプロセッサ１０，ベクトルプロセッサ２０，およびメモリ３０との関係を示すブロック図である。
【００３６】
図２に示すように、図１中の各構成要素は、スカラプロセッサ１０上に実現される。
【００３７】
図３は、先にも言及したように、スカラプロセッサ１０，ベクトルプロセッサ２０，およびメモリ３０の間の命令およびデータの入出力の態様を示すブロック図である。
【００３８】
図４は、図１中の発行チェック回路１１の詳細な構成を示す図である。
【００３９】
発行チェック回路１１は、ＲＥＱバッファ群１１１と、比較器群１１２と、リトライ要求信号生成回路１１３と、発行抑止回路１１４とを含んで構成されている。
【００４０】
図５は、図１中のフラッシュ処理回路１２およびキャッシュブロック１３の詳細な構成を示す図である。
【００４１】
フラッシュ処理回路１２は、キャッシュブロック１３内のアドレスアレー１３１のコピーであるフラッシュアドレスアレー（ＦＡＡ）１２１と、加算器１２２と、比較器１２３とを含んで構成されている。
【００４２】
キャッシュブロック１３は、Ｖビット（Ｖ０，Ｖ１，…，Ｖｎ（ｎは正整数））が付加された各エントリ（ＥＮＴＲＹ０，ＥＮＴＲＹ１，…，ＥＮＴＲＹｎ）を有するアドレスアレー（ＡＡ）１３１と、各データブロック（ＢＬＯＣＫ０，ＢＬＯＣＫ１，…，ＢＬＯＣＫｎ）からなるデータアレー（ＤＡ）１３２と、比較器１３３と、セレクタ１３４とを含んで構成されている。
【００４３】
図６は、図１中のバウンダリチェック回路１４および判定回路１５の詳細な構成を示す図である。
【００４４】
バウンダリチェック回路１４は、乗算器１４１と、加算器１４２と、バウンダリ下限レジスタ１４３と、バウンダリ上限レジスタ１４４と、大小比較器１４５とを含んで構成されている。
【００４５】
判定回路１５は、ＮＯＴゲート１５１と、ＯＲゲート１５２とを含んで構成されている。
【００４６】
次に、図１〜図６を参照して、上記のように構成された本実施の形態に係るベクトルメモリアクセス時スカラメモリアクセス命令発行制御方式の全体の動作について詳細に説明する。
【００４７】
（１）まず、ＶＳＴ命令実行時における動作について説明する。
【００４８】
ＶＳＴ命令実行時には、当該ＶＳＴ命令は、発行チェック回路１１および発行制御回路１６を通って、ベクトルプロセッサ２０に対して発行される。
【００４９】
それと同時に、当該ＶＳＴ命令は、発行チェック回路１１を通り、フラッシュ処理回路１２およびバウンダリチェック回路１４に入る。
【００５０】
この場合に、フラッシュ処理回路１２およびバウンダリチェック回路１４では、以下のａおよびｂに示すような動作が行われる。
【００５１】
ａ．フラッシュ処理回路１２での動作
フラッシュ処理回路１２は、ＶＳＴ命令のスタートアドレス（Ｓ），ディスタンス（Ｄ），およびベクトルレングス長（ＶＬ）に基づき、当該ＶＳＴ命令の書き込み範囲（メモリ３０上において書き換える領域の範囲）を算出し、当該ＶＳＴ命令によってその範囲内のデータがキャッシュにすでに登録されているかどうかを判定する。
【００５２】
この判定で「登録されている」と判断した場合には、その登録部分に対するキャッシュ上のＶビット（図５参照）をリセット（フラッシュ）するように、キャッシュブロック１３に対してフラッシュ要求（キャッシュ無効化要求）を送出する。
【００５３】
ｂ．バウンダリチェック回路１４での動作
バウンダリチェック回路１４は、ＶＳＴ命令のＳ，Ｄ，ＶＬに基づき算出した当該ＶＳＴ命令の書き込み範囲（バウンダリ領域の範囲）を設定し、その範囲内を後続のＬＤ命令がアクセスしているかどうかのチェック（バウンダリチェック）を実行する。なお、このバウンダリチェックはフラッシュ処理回路１２におけるフラッシュ処理（上記ａ参照）が終わるまで行われ、当該フラッシュ処理が終了するとバウンダリチェック回路１４でのチェック動作は終了する。
【００５４】
（２）次に、ＶＳＴ命令に後続するＬＤ命令発行時の動作について説明する。
【００５５】
上記のようにして、ＶＳＴ命令の実行によって、フラッシュ処理が実行されてバウンダリが張られている（バウンダリチェックが行われている）場合に、そのＶＳＴ命令に後続するＬＤ命令が発行される際の動作を考える。
【００５６】
先にも述べたように、従来のベクトルメモリアクセス時スカラメモリアクセス命令発行制御方式では、ＶＳＴ命令の後続のＬＤ命令は、当該ＶＳＴ命令のバウンダリ領域をアクセスするのであれば、当該ＶＳＴ命令で書き換わるキャッシュ上の領域をアクセスする可能性（バウンダリ領域内をアクセスする後続ＬＤ命令がキャッシュにＨＩＴするとそうなる）があるので、フラッシュ処理が終了するまで発行が抑止される（発行待ち状態となる）。ただし、全ての後続ＬＤ命令の発行をフラッシュ処理の終了まで待つことにすると性能がかなり低下する。このため、ＶＳＴ命令のバウンダリ領域ではないアドレスを読み出すＬＤ命令であれば、フラッシュ処理中であるにもかかわらず発行可能とさせる。
【００５７】
このような制御を実現するために、従来の技術は、バウンダリチェック回路１４においてＶＳＴ命令のＳ，Ｄ，ＶＬによりバウンダリ領域（ＶＳＴ命令により書き換えられる領域）の範囲を算出した上で、後続のＬＤ命令発行時に、当該ＬＤ命令のアドレスがバウンダリ領域の範囲内に入っていなければ当該ＬＤ命令の発行を許可し、バウンダリ領域の範囲内に入っていれば当該ＬＤ命令の発行を抑止していた。
【００５８】
本実施の形態、ひいては本発明は、従来、アクセスアドレスがバウンダリ領域の範囲内に入っていると発行が抑止されていたＬＤ命令を、判定回路１５を用いて、そのまま発行するように制御する。
【００５９】
すなわち、判定回路１５は、キャッシュブロック１３からのＨＩＴ信号（ＬＤ命令のキャッシュＨＩＴ／ＭＩＳＳ判定の判定結果を示す信号であり、キャッシュＨＩＴの場合に「１（ＯＮ）」論理を示す信号）とバウンダリチェック回路１４からのバウンダリ内判定信号（バウンダリチェックの判定結果を示す信号であり、バウンダリ領域の範囲内である場合に「１（ＯＮ）」論理を示す信号）とを入力し、「キャッシュにＨＩＴして（キャッシュＨＩＴとなり）、かつＶＳＴ命令のバウンダリ領域の範囲内に入っているＬＤ命令」に関しては、キャッシュＨＩＴからキャッシュＭＩＳＳに変更することにより、キャッシュからデータを読み出さず、「メモリ３０に対してデータ取得要求を出してメモリ３０からデータを取り込む」ように発行制御回路１６に対して指示（リクエスト発行指示）を送出する。これにより、ＶＳＴ命令により書き換えられるキャッシュ上の領域を後続ＬＤ命令がアクセスするおそれがなくなり、後続ＬＤ命令の発行の抑止を不要にすることができる。
【００６０】
なお、バウンダリ領域の範囲内においてキャッシュＭＩＳＳとなる同一アドレスに対するＬＤ命令が連続してアクセスされた場合には、発行チェック回路１１の制御等により、最初のリクエストについてのみ、メモリ３０に対してリクエスト（ミスリクエスト）が送出される（ここで述べている「同一アドレス」とは、アドレスが完全に一致している場合の他、同一のデータブロックをアクセスするようなアドレスも含まれる）。ここで、以降の同一アドレスに対するＬＤ命令は、最初のＬＤ命令のデータがキャッシュに登録されるまで、スカラプロセッサ１０内において発行を待たされる。そして、最初のＬＤ命令のデータがキャッシュに登録された時点で、後続の同一アドレスのＬＤ命令が実行される。この場合に、当該データは、すでにキャッシュに登録されているので、全てキャッシュから読み出されることになる（キャッシュＨＩＴとなる）。これにより、「無意味なキャッシュ登録時間の発生によるデータ読み出しの遅れ」を回避することができる。
【００６１】
（３）ここで、ＶＳＴ命令の実行に起因するフラッシュ処理・バウンダリ処理中におけるＬＤ命令発行時の動作を、場合分けをしてさらに詳細に説明する（発行チェック回路１１の動作については図４を参照する）。
【００６２】
Ａ．先行ミスリクエストがない場合
プロセッサコア部分から発行されたＬＤ命令（ＬＤリクエスト）は、発行チェック回路１１に入る。また、このとき、キャッシュブロック１３において、ＨＩＴ／ＭＩＳＳ判定要求に基づき、キャッシュＨＩＴ／ＭＩＳＳ判定（当該ＬＤ命令がキャッシュＨＩＴとなるかキャッシュＭＩＳＳとなるかの判定）が行われる。
【００６３】
発行チェック回路１１では、先行のミスリクエストが存在するか否かの判定（先行ミスリクエスト有無判定）が行われ、この場合には先行ミスリクエストがないため、当該ＬＤ命令（リクエスト）はそのまま発行抑止回路１１４を通り下位ブロックに発行される。
【００６４】
次に、バウンダリチェック回路１４において、当該ＬＤ命令のアクセスアドレスがＶＳＴ命令のバウンダリ領域の範囲内に入っているかどうかの判定（バウンダリ領域内外判定）が行われる。
【００６５】
ここで、上記のキャッシュＨＩＴ／ＭＩＳＳ判定，先行ミスリクエスト有無判定（ここでは、「先行ミスリクエストなし」という判定結果），およびバウンダリ領域内外判定の判定結果に基づき、以下のａ〜ｃに示す場合分けがなされ、下記のような動作が実行される。
【００６６】
ａ．「キャッシュＨＩＴ，先行ミスリクエストなし，バウンダリ領域範囲外」の場合には、キャッシュブロック１３よりキャッシュデータが読み出され、プロセッサコア部分に対して、当該キャッシュデータが当該ＬＤ命令に対するデータとして返却される。
【００６７】
ｂ．「キャッシュＨＩＴ，先行ミスリクエストなし，バウンダリ領域範囲内」の場合には、判定回路１５において、ＭＩＳＳ変換（キャッシュＨＩＴをキャッシュＭＩＳＳに変更する変換）が行われて、ミスリクエスト（データ取得要求）をメモリ３０側に出すように発行制御回路１６に対して指示（リクエスト発行指示）が出される。
【００６８】
ｃ．「キャッシュＭＩＳＳ，先行ミスリクエストなし，バウンダリ領域範囲内または範囲外」の場合には、当該ＬＤ命令は、発行制御回路１６を通り、メモリ３０に対して発行される（ミスリクエストが発行される）。この時、発行されたミスリクエストに関する情報（当該ＬＤ命令のアクセスアドレス等を示すミスリクエスト情報）が、発行チェック回路１１内のＲＥＱバッファ群１１１（図４では、ＲＥＱバッファ０またはＲＥＱバッファ１）に登録される。なお、当該ＬＤ命令のアクセスアドレスがバウンダリ領域の範囲内に入っていてもいなくても、上記動作が行われる。
【００６９】
Ｂ．先行ミスリクエストがある場合
プロセッサコア部分から発行されたＬＤ命令は、発行チェック回路１１に入る。また、このとき、キャッシュブロック１３において、キャッシュＨＩＴ／ＭＩＳＳ判定が行われる。
【００７０】
発行チェック回路１１では、先行ミスリクエスト有無判定が行われ、この場合には先行のミスリクエストがあるため、ＲＥＱバッファ群１１１に入っているそのミスリクエストに対するミスリクエスト情報に基づき、比較器群１１２を用いて、当該ＬＤ命令のアクセスアドレスと当該ミスリクエスト情報中の先行ＬＤ命令のアクセスアドレスとが一致するか否かの判定（アドレス一致判定）が行われる。
【００７１】
また、バウンダリチェック回路１４において、当該ＬＤ命令のアクセスアドレスがＶＳＴ命令のバウンダリ領域の範囲内に入っているかどうかの判定（バウンダリ領域内外判定）が行われる。
【００７２】
ここで、上記のキャッシュＨＩＴ／ＭＩＳＳ判定，先行ミスリクエスト有無判定（ここでは「先行ミスリクエストあり」という判定結果），アドレス一致判定，およびバウンダリ領域内外判定の判定結果に基づき、以下のａ〜ｅに示す場合分けがなされ、下記のような動作が実行される。
【００７３】
ａ．「キャッシュＨＩＴ，先行ミスリクエストあり，アドレス一致なし，バウンダリ領域範囲外」の場合には、キャッシュブロック１３よりキャッシュデータが読み出され、プロセッサコア部分に対して、当該キャッシュデータが当該ＬＤ命令に対するデータとして返却される。
【００７４】
ｂ．「キャッシュＨＩＴ，先行ミスリクエストあり，アドレス一致なし，バウンダリ領域範囲内」の場合には、判定回路１５において、ＭＩＳＳ変換（キャッシュＨＩＴをキャッシュＭＩＳＳに変更する変換）が行われて、ミスリクエスト（データ取得要求）をメモリ３０側に出すように発行制御回路１６に対して指示（リクエスト発行指示）が出される。
【００７５】
ｃ．「キャッシュＨＩＴ，先行ミスリクエストあり，アドレス一致あり，バウンダリ領域範囲内または範囲外」の場合には、リトライ要求信号生成回路１１３によりリクエスト待ち要求信号が生成される。このリクエスト待ち要求信号により、プロセッサコア部分から発行されたＬＤ命令は、キャッシュで受け付けられず、待ちバッファに登録される（当該ＬＤ命令のアクセスアドレスがバウンダリ領域の範囲内に入っていてもいなくても、上記動作となる）。
【００７６】
ｄ．「キャッシュＭＩＳＳ，先行ミスリクエストあり，アドレス一致なし，バウンダリ領域範囲内または範囲外」の場合には、当該ＬＤ命令は、発行制御回路１６を通り、メモリ３０に対して発行される（ミスリクエストが発行される）。この時、発行されたミスリクエストに関する情報（当該ＬＤ命令のアクセスアドレス等を示すミスリクエスト情報）が、発行チェック回路１１内のＲＥＱバッファ群１１１（ＲＥＱバッファ０またはＲＥＱバッファ１の空いている側）に登録される（ＲＥＱバッファ０およびＲＥＱバッファ１が共に空いていない場合には、リクエスト待ち要求信号が生成され、当該ＬＤ命令は待ちバッファに登録される）。なお、当該ＬＤ命令のアクセスアドレスがバウンダリ領域の範囲内に入っていてもいなくても、上記動作が行われる。
【００７７】
ｅ．「キャッシュＭＩＳＳ，先行ミスリクエストあり，アドレス一致あり，バウンダリ領域範囲内または範囲外」の場合には、リトライ要求信号生成回路１１３によりリクエスト待ち要求信号が生成される。このリクエスト待ち要求信号により、プロセッサコア部分から発行されたＬＤ命令は、キャッシュで受け付けられず、待ちバッファに登録される（当該ＬＤ命令のアクセスアドレスがバウンダリ領域の範囲内に入っていてもいなくても、上記動作となる）。
【００７８】
（４）次に、図５を参照して、図１中のフラッシュ処理回路１２およびキャッシュブロック１３におけるフラッシュ処理要求時の動作を説明する。
【００７９】
ＶＳＴ命令（リクエスト）の発行により、フラッシュ処理回路１２において、スタートアドレス（Ｓ）とディスタンス（Ｄ）とがセットされる。
【００８０】
その後、フラッシュ処理回路１２では、加算器１２２により、フラッシュアドレスアレー１２１にアクセスするアドレスが生成される。このアドレスとしては、Ｓ，Ｓ＋Ｄ，Ｓ＋２Ｄ，…，Ｓ＋（ＶＬ−１）×Ｄが生成される。
【００８１】
上記のアドレスにより、キャッシュブロック１３内のアドレスアレー１３１のコピーであるフラッシュアドレスアレー１２１がアクセスされ、ＶＳＴ命令で書き換えられる領域がキャッシュに登録されているかどうかの判定が比較器１２３を用いて行われる。
【００８２】
この判定で「ＶＳＴ命令により変更される領域内のデータが登録されているエントリがキャッシュ上にある」と判断した場合には、フラッシュ処理回路１２は、キャッシュブロック１３に対してフラッシュ要求（無効化すべきエントリ番号を有する無効化信号による要求）を出す。
【００８３】
キャッシュブロック１３では、このフラッシュ要求により、対応するアドレスアレー１３１中のエントリに対応するＶビットがリセットされる。
【００８４】
これにより、フラッシュ処理の終了後にＬＤ命令が実行されてそのアクセスアドレスがアドレスアレー１３１中の当該エントリのアドレスに一致したとしても、ＶビットがリセットされているためキャッシュＭＩＳＳとなり、メモリ３０からデータが読み出されることになる。
【００８５】
（５）最後に、図６を参照して、図１中のバウンダリチェック回路１４および判定回路１５の動作について説明する。なお、判定回路１５は、従来技術にはないものであり、本発明で新たに追加されたものである。
【００８６】
バウンダリチェック回路１４では、ＶＳＴ命令の実行により、当該ＶＳＴ命令のスタートアドレス（Ｓ）とディスタンス（Ｄ）とベクトルレングス長（ＶＬ）とが取り込まれ、乗算器１４１および加算器１４２によって、バウンダリ領域の範囲が計算される。この計算は、「バウンダリ領域の下限＝Ｓ」および「バウンダリ領域の上限＝Ｓ＋Ｄ×ＶＬ」の算出式に基づいて行われ、求められた値はバウンダリ下限レジスタ１４３およびバウンダリ上限レジスタ１４４に設定される。
【００８７】
上記のＶＳＴ命令に後続するＬＤ命令が発行されると、大小比較器１４５により、そのアクセスアドレスが上記のバウンダリ領域の範囲内に入っているかどうかが判定され、その判定結果を示すバウンダリ内判定信号が出力される。
【００８８】
判定回路１５は、キャッシュブロック１３によって生成されたＨＩＴ信号と、バウンダリチェック回路１４によって生成されたバウンダリ内判定信号とに基づいて、以下のａ〜ｄに示すような判定・処理を行う。
【００８９】
ａ．「後続のＬＤ命令のアクセスアドレスがバウンダリ領域の範囲外であり、当該ＬＤ命令がキャッシュＨＩＴとなる」と判定すれば、キャッシュからデータが読み出されるように制御する（リクエスト発行指示を出力しないようにする）。
【００９０】
ｂ．「後続のＬＤ命令のアクセスアドレスがバウンダリ領域の範囲外であり、当該ＬＤ命令がキャッシュＭＩＳＳとなる」と判定すれば、発行制御回路１６からメモリ３０に対してデータ取得要求（ミスリクエスト）を発行させるために、発行制御回路１６に対してリクエスト発行指示を出力する。
【００９１】
ｃ．「後続のＬＤ命令のアクセスアドレスがバウンダリ領域の範囲内であり、当該ＬＤ命令がキャッシュＭＩＳＳとなる」と判定すれば、上記のｂの場合と同様に、発行制御回路１６に対してリクエスト発行指示を出力する。
【００９２】
ｄ．「後続のＬＤ命令のアクセスアドレスがバウンダリ領域の範囲内であり、当該ＬＤ命令がキャッシュＨＩＴとなる」と判定すれば、「キャッシュＨＩＴ」にもかかわらず、キャッシュからデータを読み出さずにメモリ３０に対してデータ取得要求を発行するために、発行制御回路１６に対してリクエスト発行指示を出力する。これにより、後続のＬＤ命令のアクセスアドレスがバウンダリ領域の範囲内であれば、当該ＬＤ命令がキャッシュにＨＩＴしても、キャッシュＨＩＴからキャッシュＭＩＳＳへのＭＩＳＳ変換が実現され、メモリ３０に対するデータ取得要求が発行制御回路１６から発行される。
【００９３】
このようにして、本発明では、バウンダリ領域の範囲内に入っている場合には、キャッシュＨＩＴ／キャッシュＭＩＳＳにかかわらず、メモリ３０に対してデータ取得要求が出されることになる。
【００９４】
なお、本実施の形態では、上記の判定・処理が、キャッシュブロック１３によって生成されたＨＩＴ信号を入力してその論理否定信号を出力するＮＯＴゲート１５１と、ＮＯＴゲート１５１の出力信号とバウンダリチェック回路１４によって生成されたバウンダリ内判定信号とを入力してそれらの論理和信号を出力するＯＲゲート１５２とによって実現される。
【００９５】
【発明の効果】
以上説明したように、本発明では、先行するＶＳＴ命令のバウンダリ領域の範囲内に入っていても後続のＬＤ命令の発行を抑止する必要がなくなることにより、スカラプロセッサとベクトルプロセッサとを有するコンピュータシステムにおけるプロセッサの命令処理（実行）速度の高速化（プロセッサの性能向上）を図ることができるという効果が生じる。特に、ベクトル命令を多用するプログラムを実行する際には、その効果は顕著なものとなる。
【００９６】
ここで、図７は、本発明のベクトルメモリアクセス時スカラメモリアクセス命令発行制御方式における効果を具体的に説明するための図である。
【００９７】
図７に示す具体例は、ＶＳＴ命令，ＬＤ命令，およびその他の命令（後続命令１，後続命令２，および後続命令３）が入っている命令列がループなっているものである（図７では、２回ループまで示している）。なお、ＬＤ命令は、ＶＳＴ命令の書き込み領域をアクセスするものとする。
【００９８】
図７中の（ａ）に示す従来技術による方式の場合には、フラッシュ処理が終了するまでＶＳＴ命令に後続するＬＤ命令を発行することができないため、さらに後続する命令（後続命令１，後続命令２，および後続命令３）の実行も全て遅くなってしまう。
【００９９】
一方、図７中の（ｂ）に示す本発明適用の場合には、ＶＳＴ命令を発行した直後にＬＤ命令の発行が可能となる。
【０１００】
上記の両者を比較すると、２回目のループ終了まででかなりの実行時間の差（命令実行速度差）が現れていることが明確に分かる。この性能差は、ループ回数が増えるとさらに広がってくる。
【図面の簡単な説明】
【図１】本発明の実施の形態に係るベクトルメモリアクセス時スカラメモリアクセス命令発行制御方式の構成を示すブロック図である。
【図２】図１に示すベクトルメモリアクセス時スカラメモリアクセス命令発行制御方式の各構成要素と本発明が適用されるコンピュータシステムのスカラプロセッサ，ベクトルプロセッサ，およびメモリとの関係を示すブロック図である。
【図３】スカラプロセッサ，ベクトルプロセッサ，およびメモリの間の命令およびデータの入出力の態様を示すブロック図である。
【図４】図１中の発行チェック回路の詳細な構成を示す図である。
【図５】図１中のフラッシュ処理回路およびキャッシュブロックの詳細な構成を示す図である。
【図６】図１中のバウンダリチェック回路および判定回路の詳細な構成を示す図である。
【図７】本発明のベクトルメモリアクセス時スカラメモリアクセス命令発行制御方式における効果を具体的に説明するための図である。
【符号の説明】
１０スカラプロセッサ
１１発行チェック回路
１２フラッシュ処理回路
１３キャッシュブロック
１４バウンダリチェック回路
１５判定回路
１６発行制御回路
２０ベクトルプロセッサ
３０メモリ
１１１ＲＥＱバッファ群
１１２比較器群
１１３リトライ要求信号生成回路
１１４発行抑止回路
１２１フラッシュアドレスアレー
１２２加算器
１２３比較器
１３１アドレスアレー
１３２データアレー
１３３比較器
１３４セレクタ
１４１乗算器
１４２加算器
１４３バウンダリ下限レジスタ
１４４バウンダリ上限レジスタ
１４５大小比較器
１５１ＮＯＴゲート
１５２ＯＲゲート[0001]
BACKGROUND OF THE INVENTION
In a computer system including a scalar processor and a vector processor, the present invention performs a flush (invalidation) process for a VST instruction after execution of the VST instruction (vector type store instruction from the vector processor). When a subsequent LD instruction (scalar memory load instruction) is executed, a scalar memory access instruction is issued at the time of vector memory access to guarantee synchronization (coincidence) between the cache (CACHE) and the memory (main memory) It relates to the control method.
[0002]
[Prior art]
Conventionally, when two or more different processors access one memory (for example, shared memory), a method such as snooping is used to maintain cache coherency within the processor. . As a result, when the memory-side data corresponding to the cache registration data has been changed by another processor, the cache in that portion has been invalidated.
[0003]
The same applies to the relationship between the scalar processor and the vector processor. When data is written from the vector processor to the memory, the data area on the memory side corresponding to the data registered in the cache in the scalar processor may be rewritten. The corresponding part of the cache in the scalar processor needs to be invalidated (flushed).
[0004]
At this time, what is problematic in terms of performance is calculation of an invalidated area and invalidation processing time (flash processing time). If the area is calculated accurately, the extra portion is not flushed, the cache entry is not wasted, and the performance is not degraded. However, if the region is to be calculated accurately, the flash processing time increases accordingly.
[0005]
The present invention relates to control of a cache existing in a scalar processor in a computer system (see FIGS. 2 and 3) having a scalar processor and a vector processor. Here, first, instructions (LD instruction and VST instruction) appearing in the present invention will be described below.
[0006]
a. The LD instruction is an instruction issued by the scalar processor to read the contents of the memory. When the LD instruction is issued, it is determined whether or not the data to be accessed by the LD instruction is registered in the cache in the cache block (see FIG. 1 and the like). If registered, data corresponding to the LD instruction is read from the cache as “cache HIT”. On the other hand, if it is not registered, a data acquisition request (miss request) is issued to the memory as “cache MISS” via the issue control circuit (see FIG. 1) for the LD instruction. Therefore, in this case, the LD instruction reads data from the memory.
[0007]
b. The VST instruction is an instruction for writing the contents of a vector register existing in the vector processor into a memory in units of blocks. For this reason, the VST instruction includes information on the write start address (start address: S) on the memory, the write interval from the start address (distance: D), and the number of writes (vector length length: VL). The contents of the vector register are written into the memory from the address “S” on the memory and are written up to the address “S + D × (VL−1)”.
[0008]
Here, with reference to FIG. 3, operations of the LD instruction and the VST instruction in a computer system including a scalar processor, a vector processor, and a memory will be described.
[0009]
The cache existing in the scalar processor is a copy of a part of the memory contents. Here, if the VST instruction is executed after the data in the memory is registered in the cache, it is already registered in the cache indicated by the diagonal lines in FIG. 3 according to the contents of S, D, and VL. There is a case where the existing portion is written by the VST instruction.
[0010]
In this case, the VST instruction directly writes data from the vector processor to the memory, and no data is written to the cache in the scalar processor. For this reason, in the cache, it is necessary to perform invalidation (flush) processing of already registered data (in the shaded area in FIG. 3).
[0011]
When the LD instruction subsequent to the VST instruction reads an area written by the VST instruction, if the part not registered in the cache is read, the LD instruction becomes a cache MISS and retrieves data from the memory. . At this time, on the memory side, the LD instruction is executed after the writing to the memory by the VST instruction is completed. For this reason, the LD instruction can read data of the (correct) value after being rewritten by the VST instruction.
[0012]
On the other hand, when the LD instruction subsequent to the VST instruction reads the hatched portion in FIG. 3, that is, the area where the VST instruction rewrites the part already registered in the cache, this LD instruction becomes the cache HIT. Try to read data from the cache. However, on the memory side, the portion is rewritten by the VST instruction, so that if the data is read from the cache as it is, illegal data is read. For this reason, it is necessary to wait for the issuance of the LD instruction until the cache registration part of the part is invalidated by the flush process by the execution of the VST instruction.
[0013]
Incidentally, when the LD instruction is executed after invalidation, the cache is invalidated, so that the cache MISS is generated, a data acquisition request is issued to the memory, and the data after being rewritten by the VST instruction is read from the memory. be able to.
[0014]
As described above, in the conventional technique, the LD instruction (scalar LD instruction) for reading data from the cache cannot be executed during the flush process. Then, when the execution of the LD instruction is stopped in this way, the processor itself is stopped (HOLD).
[0015]
In other words, conventionally, when the access address of the LD instruction corresponds to “an area where the contents of the vector register are written to the memory in block units by the preceding VST instruction” (hereinafter referred to as “boundary area”), the VST The issuing of the LD instruction is suppressed until the flush process for the instruction is completed.
[0016]
Here, in the conventional technology, by improving with an emphasis on how to increase the flash processing speed, the technical improvement is performed so as to shorten the time that the LD instruction stops, that is, the time that the processor stops. It was.
[0017]
[Problems to be solved by the invention]
As described above, in the conventional scalar memory access instruction issuance control method at the time of vector memory access, the subsequent LD instruction cannot be executed while the flash process resulting from the execution of the VST instruction is being performed. There is a problem that the processor itself stops (HOLD) when the instruction stops.
[0018]
In addition, as described above, conventionally, by focusing on how to increase the flash processing speed and improving the above problems, the time during which the LD instruction is stopped, that is, the time during which the processor is stopped is reduced. Attempts have been made to improve the technology as much as possible.
[0019]
In the present invention, the viewpoint is changed, and during the flush processing by issuing the VST instruction, all subsequent LD instructions in which the address to be accessed is included in the boundary area of the VST instruction are issued as a cache MISS to the memory side. Therefore, it is possible to execute the subsequent instruction without stopping the LD instruction, and increase the instruction processing speed of the processor in the computer system including the scalar processor and the vector processor (improvement of the processor performance). Is possible.
[0020]
That is, in view of the above points, the object of the present invention is to issue an issuance as it is if the LD instruction is MISS to the cache when the access address of the subsequent LD instruction is in the boundary area of the VST instruction. If the LD instruction is HIT for the cache, the data is not fetched from the cache, but is changed to the cache MISS, a miss request is issued, and the data can be retrieved from the memory. It is to provide a time scalar memory access instruction issue control system.
[0021]
The patent publications related to the prior art for the scalar memory access instruction issuance control method for vector memory access according to the present invention include "JP 61-296472 A", "JP 01-222375 A", and No. 03-053667 ”.
[0022]
However, the technology described in each of the above publications differs from the present invention in the basic configuration. That is, according to the present invention, the performance of the processor is improved by a simpler configuration (a configuration using a “determination circuit” as will be described later) than the techniques described in the above publications.
[0023]
Specifically, there are differences as shown in ac below.
[0024]
a. The technique (buffer storage device) described in Japanese Patent Application Laid-Open No. 61-296472 is “intra-region access signal output from region comparison circuit corresponding to main storage block address information stored in tag storage means”. It is necessary to use “intra-area access signal invalidating means for storing intra-area access signal invalidation information”. On the other hand, the present invention has a configuration that does not require the use of such means corresponding to the in-area access signal invalidation means.
[0025]
b. The technique (buffer storage device) described in Japanese Patent Application Laid-Open No. 01-222375 discloses that “a part of the main memory address information determined to be area coincidence by the area check means is part of the main memory address information during the invalidation processing period. Is stored together with a V bit that indicates the validity of the part, and is compared with a part of the main memory address of the scalar load request subsequent to the scalar load request. When a match is found, an address match signal is output. It is necessary to use "address check means for suppressing the area coincidence signal". On the other hand, the present invention has a configuration that does not require the use of means corresponding to such address check means.
[0026]
c. The technology (information processing apparatus) described in Japanese Patent Publication No. 03-053667 discloses that “when an in-region detection signal is output from the region detection circuit in response to a scalar data load command from the command circuit, a scalar data load command is issued. Is provided with a bypass circuit called “a cache control circuit that controls to bypass the buffer memory circuit and the tag storage circuit and send them directly to the main memory”. On the other hand, according to the present invention, it is possible to achieve performance equivalent to that of the above-described bypass circuit with a much simpler circuit (a low-cost circuit) called a determination circuit.
[0027]
In Japanese Patent Laid-Open No. 61-296472 and Japanese Patent Laid-Open No. 01-222375, the following “LD request of the same address comes in while the request that became MISS is issuing a data request to the memory. The disadvantage of the prior art over the invention described in the publication is that the request is also issued to the memory ”(because it is originally the same address, only one cache registration is required, (Excess) registration data will be overwritten on the already registered cache block, meaningless cache registration time will occur, and data reading will be delayed) ing.
[0028]
On the other hand, according to the present invention, when an LD instruction with a certain access address is sent as a request to the memory under the control of an “issue check circuit” to be described later, the LD that accesses the same address that follows. The system is such that no command is issued (waiting). That is, when the preceding LD request returns, the data block including the data is registered in the cache, and the subsequent LD instruction at the same address (more precisely, the address in the data block) is registered. Waiting for issue until. Therefore, since the data is re-executed after being registered in the cache, there is no need to issue a meaningless LD instruction (the wait can be read earlier because there is no extra cache registration). There is no performance degradation as mentioned in.
[0029]
[Means for Solving the Problems]
A scalar memory access instruction issuance control system for vector memory access according to the present invention includes a cache and a control circuit for a scalar processor of a computer system including a scalar processor, a vector processor, and a memory, and performs cache HIT / MISS determination. When an instruction from a block and a processor core part is received, and when an LD instruction is received, it is checked whether or not there is a preceding miss request with the same access address as the LD instruction, and there is a preceding miss request with the same access address Is based on an issuance check circuit that waits for the issuance of the LD instruction and the S, D, and VL of the VST instruction, and calculates the write range of the VST instruction, and whether or not the data within the range is already registered in the cache. Judgment A flash processing circuit for sending a flush request to the cache block so as to reset the V bit for the registered part on the cache block when it is determined as “registered”, and a VST instruction Information indicating the range of the boundary area of the VST instruction calculated based on S, D, and VL of the VST is stored, and a boundary check for determining whether or not a subsequent LD instruction is accessing within the boundary area is flashed. Boundary check circuit that is executed until flush processing for the VST instruction in the processing circuit is completed, and issuance that outputs a data acquisition request for issuing an instruction to issue the cache MISS LD instruction to the memory side A control circuit; the cache block; and If it is determined by the determination based on the signal from the boundary check circuit that “the LD instruction subsequent to the VST instruction becomes the cache HIT and the access address is within the boundary area of the VST instruction”, the subsequent A determination circuit that controls to change the LD instruction from the cache HIT to the cache MISS and issues an instruction for the control to the issue control circuit.
[0030]
Here, the above-described scalar memory access instruction issuance control system for vector memory access recognizes “whether it is a cache HIT or a cache MISS” based on the HIT signal generated by the cache block, and is generated by the boundary check circuit. Based on the determined determination signal within the boundary, it recognizes “whether it is within or outside the boundary area” and “the access address of the LD instruction following the VST instruction is within the boundary area of the VST instruction. If there is a cache MISS at the time of the control in order to realize “control to issue a data acquisition request to the memory without reading data from the cache even if the LD instruction hits the cache,” In the same way as the decision process for outputting a request issuance instruction to the issuance control circuit It is configured with are considered as one preferred aspect.
[0031]
It should be noted that the scalar memory access instruction issuance control system for vector memory access according to the present invention is more generally caused by execution of a VST instruction in a computer system having a scalar processor having a cache block, a vector processor, and a memory. Flash processing means for performing flash processing, boundary check means for checking whether the access address of the LD instruction subsequent to the VST instruction is within the boundary area of the VST instruction (boundary check), and the VST instruction When the LD instruction subsequent to the VST instruction is issued, the “following L” is determined based on the determination result of the boundary check by the boundary check means for the VST instruction and the determination result of the cache HIT / MISS determination by the cache block. And determining means for controlling to change from the cache HIT to the cache MISS when it is determined that the instruction becomes a cache HIT and its access address is within the boundary area of the VST instruction. It can be expressed as
[0032]
DETAILED DESCRIPTION OF THE INVENTION
Next, the present invention will be described in detail with reference to the drawings.
[0033]
FIG. 1 is a block diagram showing a configuration of a scalar memory access instruction issuance control system during vector memory access according to an embodiment of the present invention.
[0034]
Referring to FIG. 1, the vector memory access time scalar memory access instruction issuance control system according to the present embodiment includes an issuance check circuit 11 that accepts a request (instruction) from a processor core portion, and a flash resulting from execution of a VST instruction. Boundary check for checking whether the flash processing circuit 12 that performs processing, the cache block 13 including the cache and its control circuit, and the LD instruction subsequent to the VST instruction are within the boundary area of the VST instruction A determination circuit 15 that determines whether or not a request issuance instruction should be issued to the issuance control circuit 16 based on the HIT signal from the cache block 13 and the in-boundary determination signal from the boundary check circuit 14, and a cache MISS L became Instructions is configured to include a issue control circuit 16 for controlling the instruction issue control for outputting the data acquisition request issued to the memory 30 side (miss request).
[0035]
FIG. 2 is a block diagram showing the relationship between the constituent elements of the scalar memory access instruction issuance control system at the time of vector memory access shown in FIG. 1 and the scalar processor 10, vector processor 20, and memory 30 of the computer system to which the present invention is applied. FIG.
[0036]
As shown in FIG. 2, each component in FIG. 1 is realized on a scalar processor 10.
[0037]
FIG. 3 is a block diagram showing an input / output mode of instructions and data among the scalar processor 10, the vector processor 20, and the memory 30, as mentioned above.
[0038]
FIG. 4 is a diagram showing a detailed configuration of the issue check circuit 11 in FIG.
[0039]
The issue check circuit 11 includes a REQ buffer group 111, a comparator group 112, a retry request signal generation circuit 113, and an issue suppression circuit 114.
[0040]
FIG. 5 is a diagram showing a detailed configuration of the flash processing circuit 12 and the cache block 13 in FIG.
[0041]
The flash processing circuit 12 includes a flash address array (FAA) 121 that is a copy of the address array 131 in the cache block 13, an adder 122, and a comparator 123.
[0042]
The cache block 13 includes an address array (AA) 131 having entries (ENTRY0, ENTRY1,..., ENTRYn) to which V bits (V0, V1,..., Vn (n is a positive integer)) are added, and each data block. A data array (DA) 132 composed of (BLOCK0, BLOCK1,..., BLOCKn), a comparator 133, and a selector 134 are included.
[0043]
FIG. 6 is a diagram showing a detailed configuration of boundary check circuit 14 and determination circuit 15 in FIG.
[0044]
The boundary check circuit 14 includes a multiplier 141, an adder 142, a boundary lower limit register 143, a boundary upper limit register 144, and a magnitude comparator 145.
[0045]
The determination circuit 15 includes a NOT gate 151 and an OR gate 152.
[0046]
Next, the overall operation of the vector memory access scalar memory access instruction issuance control system according to the present embodiment configured as described above will be described in detail with reference to FIGS.
[0047]
(1) First, the operation when the VST instruction is executed will be described.
[0048]
When the VST instruction is executed, the VST instruction is issued to the vector processor 20 through the issue check circuit 11 and the issue control circuit 16.
[0049]
At the same time, the VST instruction passes through the issue check circuit 11 and enters the flash processing circuit 12 and the boundary check circuit 14.
[0050]
In this case, the flash processing circuit 12 and the boundary check circuit 14 operate as shown in a and b below.
[0051]
a. Operation in the flash processing circuit 12
Based on the start address (S), the distance (D), and the vector length length (VL) of the VST instruction, the flash processing circuit 12 calculates the writing range of the VST instruction (the range of the area to be rewritten on the memory 30), It is determined whether or not the data within the range is already registered in the cache by the VST instruction.
[0052]
If it is determined as “registered” in this determination, a flush request (cache invalidation) is issued to the cache block 13 so as to reset (flush) the V bit (see FIG. 5) on the cache for the registered portion. Send request).
[0053]
b. Operation in the boundary check circuit 14
The boundary check circuit 14 sets a writing range (boundary area range) of the VST instruction calculated based on S, D, and VL of the VST instruction, and checks whether or not a subsequent LD instruction is accessing the range. (Boundary check) is executed. This boundary check is performed until the flash processing in the flash processing circuit 12 (see a) is completed, and when the flash processing ends, the check operation in the boundary check circuit 14 ends.
[0054]
(2) Next, the operation when the LD instruction is issued following the VST instruction will be described.
[0055]
As described above, when the flash process is executed and the boundary is set by executing the VST instruction (when the boundary check is performed), the LD instruction subsequent to the VST instruction is issued. Think about the behavior.
[0056]
As described above, in the conventional vector memory access scalar memory access instruction issuance control method, the LD instruction subsequent to the VST instruction is written by the VST instruction if the boundary area of the VST instruction is accessed. Since there is a possibility of accessing an area on the cache to be replaced (when a subsequent LD instruction that accesses the boundary area hits the cache), issuance is suppressed (becomes waiting for issuance) until the flush process is completed. . However, if waiting for the issuance of all subsequent LD instructions until the end of the flash process, the performance is considerably degraded. For this reason, any LD instruction that reads an address that is not a boundary area of the VST instruction can be issued regardless of the flash process being performed.
[0057]
In order to realize such control, the conventional technique calculates the range of the boundary area (area rewritten by the VST instruction) from the S, D, and VL of the VST instruction in the boundary check circuit 14, and then the subsequent LD. When the instruction is issued, if the address of the LD instruction is not within the boundary area, the issuance of the LD instruction is permitted. If the address is within the boundary area, the issuance of the LD instruction is inhibited.
[0058]
In the present embodiment, and thus the present invention, the determination circuit 15 is used to control to issue the LD instruction, which is conventionally prevented from being issued when the access address is within the boundary area.
[0059]
That is, the determination circuit 15 is a boundary between the HIT signal from the cache block 13 (a signal indicating the determination result of the cache HIT / MISS determination of the LD instruction, and a signal indicating “1 (ON)” logic in the case of the cache HIT). An in-boundary determination signal (a signal indicating the determination result of the boundary check and a signal indicating “1 (ON)” logic when the signal is within the boundary area) from the check circuit 14 is input, and “HIT in the cache” is input. As for the “LD instruction within the boundary area of the VST instruction” (which becomes the cache HIT), by changing from the cache HIT to the cache MISS, data is not read from the cache, Issue a data acquisition request and fetch data from the memory 30 ”. It sends instructs (requests issuance instruction) to the control circuit 16. As a result, there is no possibility of the subsequent LD instruction accessing the area on the cache that is rewritten by the VST instruction, and it is possible to make it unnecessary to suppress the issue of the subsequent LD instruction.
[0060]
Note that when the LD instruction for the same address as the cache MISS is continuously accessed within the boundary area, only the first request is requested from the memory 30 by the control of the issue check circuit 11 ( ("Miss request") is sent out (the "same address" described here includes addresses that access the same data block in addition to the case where the addresses completely match). Here, the subsequent LD instruction for the same address is awaited in the scalar processor 10 until the data of the first LD instruction is registered in the cache. Then, when the data of the first LD instruction is registered in the cache, the subsequent LD instruction of the same address is executed. In this case, since the data is already registered in the cache, all the data is read from the cache (becomes a cache HIT). As a result, it is possible to avoid “delay of data reading due to occurrence of meaningless cache registration time”.
[0061]
(3) Here, the operation at the time of issuing the LD instruction during the flash process / boundary process resulting from the execution of the VST instruction will be described in more detail by dividing the case (see FIG. 4 for the operation of the issue check circuit 11). refer).
[0062]
A. When there is no preceding miss request
The LD instruction (LD request) issued from the processor core part enters the issue check circuit 11. At this time, the cache block 13 performs a cache HIT / MISS determination (determination of whether the LD instruction becomes a cache HIT or a cache MISS) based on the HIT / MISS determination request.
[0063]
The issue check circuit 11 determines whether or not there is a preceding miss request (determining whether or not there is a preceding miss request). In this case, since there is no preceding miss request, the LD instruction (request) is issued as is. It is issued to the lower block through the circuit 114.
[0064]
Next, the boundary check circuit 14 determines whether or not the access address of the LD instruction falls within the boundary area of the VST instruction (boundary area inside / outside determination).
[0065]
Here, the following cases a to c are made based on the determination result of the cache HIT / MISS determination, the presence / absence of preceding miss request (here, the determination result “no preceding miss request”), and the determination result of the boundary area inside / outside determination The following operations are executed.
[0066]
a. In the case of “cache HIT, no preceding miss request, out of boundary area range”, the cache data is read from the cache block 13 and the cache data is returned to the processor core as data for the LD instruction. .
[0067]
b. In the case of “cache HIT, no preceding miss request, within boundary area range”, the determination circuit 15 performs MISS conversion (conversion to change the cache HIT to the cache MISS), and issues a miss request (data acquisition request). An instruction (request issue instruction) is issued to the issuance control circuit 16 so as to be issued to the memory 30 side.
[0068]
c. In the case of “cache MISS, no preceding miss request, within or outside the boundary area range”, the LD instruction is issued to the memory 30 through the issue control circuit 16 (a miss request is issued). . At this time, information on the issued miss request (miss request information indicating the access address of the LD instruction) is stored in the REQ buffer group 111 (REQ buffer 0 or REQ buffer 1 in FIG. 4) in the issue check circuit 11. be registered. The above operation is performed regardless of whether the access address of the LD instruction is within the boundary area.
[0069]
B. When there is a preceding miss request
The LD instruction issued from the processor core part enters the issue check circuit 11. At this time, the cache HIT / MISS determination is performed in the cache block 13.
[0070]
In the issuance check circuit 11, the presence / absence determination of the preceding miss request is performed. In this case, since there is a preceding miss request, the comparator group 112 is controlled based on the miss request information for the miss request stored in the REQ buffer group 111. Using this, it is determined whether the access address of the LD instruction matches the access address of the preceding LD instruction in the miss request information (address match determination).
[0071]
The boundary check circuit 14 determines whether or not the access address of the LD instruction falls within the boundary area of the VST instruction (boundary area inside / outside determination).
[0072]
Here, based on the determination results of the cache HIT / MISS determination, the presence / absence of preceding miss request (in this case, the determination result that “there is a preceding miss request”), the address match determination, and the boundary area inside / outside determination, the following a to e And the following operations are executed.
[0073]
a. In the case of “cache HIT, previous miss request, no address match, out of boundary area range”, the cache data is read from the cache block 13, and the cache data is the data corresponding to the LD instruction for the processor core part. Will be returned as
[0074]
b. In the case of “cache HIT, prior miss request, no address match, within boundary area range”, the determination circuit 15 performs MISS conversion (conversion to change the cache HIT to cache MISS), and the miss request (data An instruction (request issuance instruction) is issued to the issuance control circuit 16 so as to issue an (acquisition request) to the memory 30 side.
[0075]
c. In the case of “cache HIT, preceding miss request, address coincidence, within boundary area range or out of range”, a retry request signal generation circuit 113 generates a request wait request signal. Due to this request wait request signal, the LD instruction issued from the processor core portion is not accepted by the cache and is registered in the wait buffer (the access address of the LD instruction may or may not fall within the boundary area range). Is also the above operation).
[0076]
d. In the case of “cache MISS, prior miss request, no address match, within boundary area range or out of range”, the LD instruction is issued to memory 30 through issue control circuit 16 (miss request is publish). At this time, information on the issued miss request (miss request information indicating the access address of the LD instruction) is the REQ buffer group 111 in the issuance check circuit 11 (the vacant side of the REQ buffer 0 or the REQ buffer 1). (When both REQ buffer 0 and REQ buffer 1 are not empty, a request wait request signal is generated and the LD instruction is registered in the wait buffer). The above operation is performed regardless of whether the access address of the LD instruction is within the boundary area.
[0077]
e. In the case of “cache MISS, preceding miss request, address coincidence, within boundary area range or out of range”, a retry request signal generation circuit 113 generates a request wait request signal. Due to this request wait request signal, the LD instruction issued from the processor core portion is not accepted by the cache and is registered in the wait buffer (the access address of the LD instruction may or may not fall within the boundary area range). Is also the above operation).
[0078]
(4) Next, with reference to FIG. 5, the operation at the time of the flash processing request in the flash processing circuit 12 and the cache block 13 in FIG. 1 will be described.
[0079]
When the VST instruction (request) is issued, the flash processing circuit 12 sets the start address (S) and the distance (D).
[0080]
Thereafter, in the flash processing circuit 12, an address for accessing the flash address array 121 is generated by the adder 122. As this address, S, S + D, S + 2D,..., S + (VL−1) × D are generated.
[0081]
The flash address array 121, which is a copy of the address array 131 in the cache block 13, is accessed by the above address, and the comparator 123 is used to determine whether or not the area to be rewritten by the VST instruction is registered in the cache. .
[0082]
If it is determined in this determination that “the entry in which data in the area changed by the VST instruction is registered is on the cache”, the flash processing circuit 12 requests the cache block 13 to make a flush request (invalidate it). A request with an invalidation signal having a power entry number).
[0083]
In the cache block 13, the V bit corresponding to the entry in the corresponding address array 131 is reset by this flush request.
[0084]
As a result, even if the LD instruction is executed after the end of the flash processing and the access address matches the address of the entry in the address array 131, the V bit is reset, so that the cache MISS is obtained and the data from the memory 30 is transferred. Will be read.
[0085]
(5) Finally, operations of the boundary check circuit 14 and the determination circuit 15 in FIG. 1 will be described with reference to FIG. The determination circuit 15 is not present in the prior art and is newly added in the present invention.
[0086]
The boundary check circuit 14 takes in the start address (S), the distance (D), and the vector length length (VL) of the VST instruction by executing the VST instruction, and the multiplier 141 and the adder 142 store the boundary area. A range is calculated. This calculation is performed based on the calculation formulas of “boundary region lower limit = S” and “boundary region upper limit = S + D × VL”, and the obtained values are set in the boundary lower limit register 143 and the boundary upper limit register 144. .
[0087]
When the LD instruction subsequent to the VST instruction is issued, the size comparator 145 determines whether or not the access address is within the boundary area, and an in-boundary determination signal indicating the determination result. Is output.
[0088]
Based on the HIT signal generated by the cache block 13 and the in-boundary determination signal generated by the boundary check circuit 14, the determination circuit 15 performs determination / processing as shown in the following a to d.
[0089]
a. If it is determined that “the access address of the subsequent LD instruction is out of the boundary area and the LD instruction becomes the cache HIT”, control is performed so that data is read from the cache (so that a request issue instruction is not output). To do).
[0090]
b. If it is determined that “the access address of the subsequent LD instruction is out of the boundary area and the LD instruction becomes the cache MISS”, the issue control circuit 16 issues a data acquisition request (miss request) to the memory 30. Therefore, a request issuance instruction is output to the issuance control circuit 16.
[0091]
c. If it is determined that “the access address of the subsequent LD instruction is within the boundary area and the LD instruction becomes the cache MISS”, the request issuing instruction is issued to the issue control circuit 16 as in the case of b above. Is output.
[0092]
d. If it is determined that “the access address of the subsequent LD instruction is within the boundary area and the LD instruction becomes the cache HIT”, the data is not read from the cache to the memory 30 in spite of the “cache HIT”. In order to issue a data acquisition request, a request issuance instruction is output to the issuance control circuit 16. As a result, if the access address of the subsequent LD instruction is within the boundary area, even if the LD instruction hits the cache, the MISS conversion from the cache HIT to the cache MISS is realized, and a data acquisition request to the memory 30 is obtained. Is issued from the issue control circuit 16.
[0093]
In this way, in the present invention, when the boundary is within the boundary area, a data acquisition request is issued to the memory 30 regardless of the cache HIT / cache MISS.
[0094]
In the present embodiment, the determination / processing described above includes a NOT gate 151 that inputs a HIT signal generated by the cache block 13 and outputs a logical negation signal, an output signal of the NOT gate 151, and a boundary check circuit. And an OR gate 152 that inputs the in-boundary determination signal generated by 14 and outputs a logical sum signal thereof.
[0095]
【The invention's effect】
As described above, in the present invention, there is no need to suppress the issue of the subsequent LD instruction even if it is within the boundary area of the preceding VST instruction, so that the computer system having the scalar processor and the vector processor is eliminated. As a result, it is possible to increase the instruction processing (execution) speed of the processor (improve the processor performance). In particular, the effect becomes remarkable when executing a program that frequently uses vector instructions.
[0096]
FIG. 7 is a diagram for specifically explaining the effect of the vector memory access time scalar memory access instruction issue control method of the present invention.
[0097]
In the specific example shown in FIG. 7, an instruction string including a VST instruction, an LD instruction, and other instructions (subsequent instruction 1, subsequent instruction 2, and subsequent instruction 3) is a loop (in FIG. 7, 2 loops are shown). It is assumed that the LD instruction accesses the VST instruction write area.
[0098]
In the case of the method according to the prior art shown in FIG. 7A, since the LD instruction subsequent to the VST instruction cannot be issued until the flush process is completed, further subsequent instructions (subsequent instruction 1, subsequent instruction) 2 and subsequent instructions 3) are all slowed down.
[0099]
On the other hand, in the case of application of the present invention shown in FIG. 7B, the LD instruction can be issued immediately after the VST instruction is issued.
[0100]
Comparing the above, it can be clearly seen that a considerable difference in execution time (instruction execution speed difference) appears until the end of the second loop. This performance difference is further widened as the number of loops increases.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of a scalar memory access instruction issue control method during vector memory access according to an embodiment of the present invention;
2 is a block diagram showing the relationship between each component of the scalar memory access instruction issuance control method at the time of vector memory access shown in FIG. 1 and a scalar processor, vector processor, and memory of a computer system to which the present invention is applied; .
FIG. 3 is a block diagram showing an input / output mode of instructions and data between a scalar processor, a vector processor, and a memory.
4 is a diagram showing a detailed configuration of an issue check circuit in FIG. 1. FIG.
FIG. 5 is a diagram showing a detailed configuration of a flash processing circuit and a cache block in FIG. 1;
6 is a diagram illustrating a detailed configuration of a boundary check circuit and a determination circuit in FIG. 1. FIG.
FIG. 7 is a diagram for specifically explaining an effect in the scalar memory access instruction issue control method at the time of vector memory access according to the present invention;
[Explanation of symbols]
10 Scalar processor
11 Issue check circuit
12 Flash processing circuit
13 cash blocks
14 Boundary check circuit
15 Judgment circuit
16 Issuance control circuit
20 vector processor
30 memory
111 REQ buffers
112 comparators
113 Retry request signal generation circuit
114 Issuance suppression circuit
121 Flash address array
122 Adder
123 comparator
131 Address array
132 Data array
133 comparator
134 Selector
141 multiplier
142 Adder
143 Boundary lower limit register
144 Boundary upper limit register
145 large / small comparator
151 NOT gate
152 OR gate

Claims

In a computer system comprising a scalar processor having a cache block, a vector processor, and a memory,
Flash processing means for performing flash processing resulting from execution of a VST instruction;
Boundary check means for checking whether the access address of the LD instruction subsequent to the VST instruction is within the boundary area of the VST instruction;
At the time of issuing the LD instruction subsequent to the VST instruction, based on the determination result of the boundary check by the boundary check means for the VST instruction and the determination result of the cache HIT / MISS determination by the cache block, “the subsequent LD instruction is the cache HIT And when the access address is within the boundary area of the VST instruction ”, a determination unit that controls to change from the cache HIT to the cache MISS;
When it is determined that “the subsequent LD instruction becomes a cache HIT and its access address is not within the boundary area of the VST instruction”, the data for the subsequent LD instruction is read according to the cache HIT determination of the determination means. The cache block to be returned;
A scalar memory access instruction issuance control system at the time of vector memory access.

In a scalar processor of a computer system including a scalar processor, a vector processor, and a memory,
A cache block comprising a cache and its control circuit and performing cache HIT / MISS determination;
When an instruction from the processor core part is received and an LD instruction is received, it is checked whether or not there is a preceding miss request with the same access address as that LD instruction. An issue check circuit that waits for the issuance of an LD instruction;
Based on the S, D, and VL of the VST instruction, the writing range of the VST instruction is calculated, it is determined whether or not the data within the range is already registered in the cache, and “is registered” in the determination. A flash processing circuit for sending a flush request to the cache block so as to reset a V bit for the registered portion on the cache block if determined;
Information indicating the range of the boundary area of the VST instruction calculated based on the S, D, and VL of the VST instruction is held, and a boundary check is performed to determine whether or not a subsequent LD instruction is accessing the boundary area. A boundary check circuit that executes until the flash processing for the VST instruction in the flash processing circuit is completed;
In the determination based on the signals from the issue control circuit that outputs a data acquisition request for issuing an instruction to issue an LD instruction that has become a cache MISS to the memory side, and the cache block and the boundary check circuit. When it is determined that the subsequent LD instruction of the VST instruction becomes the cache HIT and the access address is within the boundary area of the VST instruction, the subsequent LD instruction is changed from the cache HIT to the cache MISS. And a determination circuit that issues an instruction for the control to the issuance control circuit,
When it is determined that “the subsequent LD instruction becomes a cache HIT and its access address is not within the boundary area of the VST instruction”, the data for the subsequent LD instruction is read according to the cache HIT determination of the determination means. The cache block to be returned;
A scalar memory access instruction issuance control system at the time of vector memory access.

An address array to which a V bit is added and a data array are provided, and cache HIT / MISS determination is performed by determining whether the access address of the LD instruction matches the address in each entry in the address array, and an HIT signal indicating the determination result A cache block that outputs
Using the flash address array that is a copy of the address array in the cache block, the write area of the VST instruction is obtained based on the S, D, and VL of the VST instruction, and then the address array in the cache block is searched. 3. The scalar memory access instruction issuance control system for vector memory access according to claim 2, further comprising: a flash processing circuit for performing flash processing on V bits.

A multiplier and an adder for calculating the range of the boundary area based on the calculation formulas of “boundary area lower limit = S” and “boundary area upper limit = S + D × VL” based on S, D, and VL of the VST instruction;
A boundary lower limit register and a boundary upper limit register for holding the upper limit and the lower limit of the range of the boundary area calculated by the multiplier and the adder;
Boundary area range represented by the values set in the boundary lower limit register and the boundary upper limit register is compared with the access address of the LD instruction subsequent to the VST instruction, and an in-boundary determination signal indicating the comparison result is output. 4. The scalar memory access instruction issue control system for vector memory access according to claim 3, further comprising a boundary check circuit including a large / small comparator.

Based on the HIT signal generated by the cache block, it recognizes “whether it is a cache HIT or a cache MISS” and, based on the in-boundary determination signal generated by the boundary check circuit, “is it within the boundary area? "If the access address of the LD instruction following the VST instruction is within the boundary area of the VST instruction, even if the LD instruction hits the cache, the data is transferred from the cache. In order to realize “control that issues a data acquisition request to the memory without reading”, a determination circuit that outputs a request issuance instruction to the issuance control circuit as in the case of the cache MISS at the time of the control. 5. The scalar memory area at the time of vector memory access according to claim 4, further comprising: Seth instruction issue control method.

The determination circuit inputs a NOT gate that inputs a HIT signal generated by the cache block and outputs a logical negation signal thereof, an output signal of the NOT gate, and an in-boundary determination signal generated by the boundary check circuit 6. The vector memory access time scalar memory access instruction issuance control system according to claim 5, comprising an OR gate that outputs a logical sum signal thereof.

A REQ buffer group that holds miss request information for a preceding miss request, a comparator group, a retry request signal generation circuit, and an issuance suppression circuit are provided, and LD instructions for the same address are continuously performed within the boundary area. 3. An issue check circuit for controlling to send a request to the memory only for the first request regardless of the cache HIT / cache MISS when accessed. 8. A scalar memory access instruction issue control system for vector memory access according to claim 4, claim 5, or claim 6.