JP2007509439A

JP2007509439A - Method and apparatus for efficient sequence preservation in interconnected networks

Info

Publication number: JP2007509439A
Application number: JP2006536679A
Authority: JP
Inventors: チャーニー，マーク; ラジワール，ラヴィ; アフジャ，プリトパル; マッティナ，マット
Original assignee: インテルコーポレイション
Priority date: 2003-10-22
Filing date: 2004-10-15
Publication date: 2007-04-12
Anticipated expiration: 2024-10-15
Also published as: DE112004001984T5; WO2005041047A3; KR100841130B1; JP4658064B2; KR20060063994A; WO2005041047A2

Abstract

物理分散キャッシュメモリシステムは、相互接続ネットワークと、第１レベルのキャッシュメモリ・スライスと、第２レベルのキャッシュメモリ・スライスとを有する。第１レベルのキャッシュメモリ・スライスは、相互接続ネットワークに結合され、タグ付き順序保存要求を生成する。各タグ付き順序保存要求は、要求側識別表示と保存シーケンス・トークンとを有するタグを有する。第２レベルのキャッシュメモリ・スライスは、相互接続ネットワークに結合され、タグ付き順序保存要求の各タグに応じて、物理キャッシュメモリシステムを通じて順番に順序保存要求を実行する。
The physical distributed cache memory system has an interconnect network, a first level cache memory slice, and a second level cache memory slice. The first level cache memory slice is coupled to the interconnect network and generates a tagged order preservation request. Each tagged sequence save request has a tag with a requester identification and a save sequence token. The second level cache memory slice is coupled to the interconnect network and executes the save order request sequentially through the physical cache memory system in response to each tag in the tagged save order request.

Description

本発明は、概してキャッシュメモリ管理に関し、特に無順序ネットワークで共有分散キャッシュメモリシステムに順序保存を提供することに関する。 The present invention relates generally to cache memory management, and more particularly to providing order storage for shared distributed cache memory systems in unordered networks.

低速メインメモリへの平均アクセス時間を改善するために、プロセッサと低速メインメモリとの間に高速キャッシュメモリを使用することは周知である。キャッシュメモリの使用は、プロセッサの実行性能を改善し得る。当初ではキャッシュメモリはプロセッサから離れていたが、後に技術の改善と共にプロセッサの一体部分になった。キャッシュメモリがプロセッサの一体部分になると、キャッシュメモリへのアクセス時間が更に低減され得る。 It is well known to use a high speed cache memory between the processor and the low speed main memory to improve the average access time to the low speed main memory. The use of cache memory can improve the execution performance of the processor. Initially, the cache memory was separate from the processor, but later became an integral part of the processor as technology improved. When the cache memory becomes an integral part of the processor, the access time to the cache memory can be further reduced.

複数レベルのキャッシュメモリがプロセッサとメインメモリとの間に取り入れられている。一般的に、キャッシュメモリの速度が増加すると、キャッシュメモリがプロセッサに近くなるが、そのサイズが減少する。異なるように解釈すると、キャッシュメモリのサイズが一般的に増加し、アクセス時間が増加すると、キャッシュメモリがプロセッサから離れる。しかし、特に命令が他の命令又はメモリへのアドレスに分岐又はジャンプしたときに、複数レベルのキャッシュメモリはキャッシュ管理を複雑化する。 Multiple levels of cache memory are incorporated between the processor and main memory. In general, as the speed of the cache memory increases, the cache memory becomes closer to the processor, but its size decreases. When interpreted differently, the size of the cache memory generally increases, and as the access time increases, the cache memory moves away from the processor. However, multiple levels of cache memory complicate cache management, especially when instructions branch or jump to addresses to other instructions or memory.

プロセッサの内部又は外部のメモリコントローラ又はキャッシュコントローラは、メインメモリとプロセッサとの間でキャッシュメモリのキャッシュ管理を提供するために使用されている。キャッシュメモリの使用を最大化し、プロセッサが低速メインメモリからデータ／命令を読み取るため又は低速メインメモリにデータ／命令を書き込むために必要なキャッシュへのミスの数を低減するために、様々なキャッシュメモリ管理アルゴリズムが取り入れられている。共有され得るデータブロックの状態をトラッキングすることにより、キャッシュメモリに保存されたデータの一貫性を維持するために、キャッシュ・コヒーレンス・プロトコル（cache coherence protocol）が取り入れられている。他のキャッシュメモリ管理アルゴリズムも取り入れられている。 A memory controller or cache controller internal or external to the processor is used to provide cache management of the cache memory between the main memory and the processor. Various cache memories to maximize the use of cache memory and reduce the number of cache misses required for the processor to read data / instructions from the slow main memory or write data / instructions to the slow main memory Management algorithms are incorporated. A cache coherence protocol has been introduced to maintain the consistency of data stored in cache memory by tracking the state of data blocks that can be shared. Other cache memory management algorithms are also incorporated.

本発明の以下の詳細な説明では、本発明の完全な理解を提供するために、複数の特定の詳細が示されている。しかし、本発明はこれらの特定の詳細がなくても実施され得ることが、当業者に明らかである。その他に、不要に本発明の態様をあいまいにしないように、周知の方法、手順、構成要素及び回路については説明しない。 In the following detailed description of the present invention, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, it will be apparent to those skilled in the art that the present invention may be practiced without these specific details. In other instances, well-known methods, procedures, components, and circuits have not been described so as not to obscure aspects of the present invention unnecessarily.

本発明の一実施例は、論理的に共有されているが物理的に分散されたキャッシュを共有する複数のプロセッサを備えたシステムを扱う。複数のプロセッサは、相互接続ネットワークで物理的に分散されたキャッシュと通信する。相互接続ネットワークは、１つのプロセッサ又はキャッシュ（“要求側”）から同じ又は異なるキャッシュへの要求の順序を保存しない無順序ネットワークである。更に、相互接続ネットワークで１つのキャッシュが他のキャッシュに送信し得るメッセージもまた、ネットワークにより順番に保持されない。しかし、メッセージは、要求側により送出された順番に実行することを必要とすることがある。これらのメッセージは、順序要求（ordered request）と呼ばれることがある。順番に実行されることを必要としないメッセージは、無順序要求（non-ordered request）と呼ばれることがある。要求側により発行された保存要求は、順序保存要求でもよく、無順序保存要求でもよい。 One embodiment of the present invention deals with systems with multiple processors that share a logically shared but physically distributed cache. The plurality of processors communicate with caches that are physically distributed over the interconnect network. An interconnect network is an unordered network that does not preserve the order of requests from one processor or cache ("requester") to the same or different caches. In addition, messages that can be sent from one cache to another in an interconnected network are also not kept in order by the network. However, the messages may need to be executed in the order in which they are sent by the requester. These messages are sometimes referred to as ordered requests. Messages that do not need to be executed in order may be referred to as non-ordered requests. The storage request issued by the requester may be an order storage request or an unordered storage request.

順序要求のサブセットが順序保存要求である。以下に更に説明する順序保存要求は、要求側により発行された他の順序保存要求に関して、順番に実行を必要とする要求側の保存要求である。前の順序保存要求は、現行の順序保存要求が処理される前に処理されなければならない。すなわち、現行の順序保存要求は、後の順序保存要求が処理される前に処理されなければならない。無順序保存要求は、要求側の他の順序要求に関して、ばらばらの順番で実行可能な要求側の保存要求である。 A subset of order requests are order preservation requests. The order saving request described further below is a requesting side saving request that needs to be executed in order with respect to other order saving requests issued by the requesting side. The previous save order request must be processed before the current save order request is processed. That is, the current save order request must be processed before a subsequent save order request is processed. An unordered save request is a save request on the request side that can be executed out of order with respect to other order requests on the request side.

論理的に共有されたキャッシュメモリは、キャッシュメモリの特定のアドレスブロックが物理的に分散されたキャッシュメモリの異なるチャンク（chunk）により管理されるように分割されてもよい。 A logically shared cache memory may be partitioned such that a particular address block of the cache memory is managed by different chunks of the physically distributed cache memory.

他の実施例では、論理的に共有されたキャッシのキャッシュ管理は、特定のメモリ一貫性モデルがプロセッサからメインメモリへの特定の保存の処理に設定されるという順序要件を扱う。特別の順番の処理を必要とする特定の保存は、ここでは“順序保存（ordered store）”又は“順序保存要求（ordered store request）”と呼ばれる。その他に、保存は特別の順番の処理を必要としなくてもよく、ここでは“無順序保存（unordered store）”、“無順序保存要求（unordered store request）”又は“無順序保存要求（non-ordered store request）”と呼ばれる。これらの無順序保存要求は、ばらばらの順番で実行又は処理可能である。順序保存要求の処理は、現行の順序保存要求の前に発行された前の順序保存要求が、現行の順序保存要求の実行の前に完全に処理されることを必要とする。 In other embodiments, cache management of logically shared caches addresses the ordering requirement that a specific memory consistency model is set for a specific save operation from the processor to main memory. A particular store that requires a special order of processing is referred to herein as an “ordered store” or “ordered store request”. In addition, the storage may not require a special ordering process, in this case “unordered store”, “unordered store request” or “non-ordered storage request (non- ordered store request) ”. These unordered storage requests can be executed or processed in a discrete order. Processing the save order request requires that the previous save order request issued prior to the current save order request be completely processed before execution of the current save order request.

１つのプロセッサからの順序保存要求を処理する簡単な低性能の方法は、プロセッサからの全ての前の順序保存要求がキャッシュメモリシステムにより処理されるまで、プロセッサが新しい順序保存要求を発行することを抑制することが挙げられる。しかし、複数のプロセッサからの順序保存要求を処理するマルチプロセッサシステムでは、それほど簡単ではない。マルチプロセッサシステムにおいて複数のプロセッサからの順序保存要求を処理する方法は、更に複雑になる。 A simple low performance method of processing a save order request from one processor is that the processor issues a new save order request until all previous save order requests from the processor are processed by the cache memory system. Suppression. However, it is not so simple in a multiprocessor system that processes order preservation requests from a plurality of processors. The method of processing order preservation requests from multiple processors in a multiprocessor system is further complicated.

本発明の実施例では、順序保存要求を処理する簡単な低性能の方法は、マルチプロセッサシステムをサポートするために、少量の更なる要求トラッキングハードウェアを使用することにより、共有メモリシステムを備えたプロセッサのネットワークで並列処理を利用するように、複数のプロセッサを追加することで改善される。本発明の他の実施例では、ネットワークプロセッサの並列処理は、同時に又は重複する時間間隔で、ネットワークの１つのプロセッサから複数の順序保存要求を処理可能である。１つのプロセッサは、無順序ネットワークでキャッシュシステムに新しい順序保存要求を送信する前に、前の順序保存が完全に完了することを待機する必要はない。 In an embodiment of the present invention, a simple low performance method of processing order preservation requests provided a shared memory system by using a small amount of additional request tracking hardware to support a multiprocessor system. This is improved by adding multiple processors to use parallel processing in a network of processors. In another embodiment of the present invention, parallel processing of network processors can process multiple order preservation requests from one processor of the network at the same time or overlapping time intervals. One processor does not have to wait for the previous order save to complete completely before sending a new order save request to the cache system over the unordered network.

図１を参照すると、本発明が使用され得る一般的なコンピュータシステム100のブロック図が図示されている。コンピュータシステム100は、中央処理ユニット（CPU）101と、入出力装置（I/O）102（キーボード、モデム、プリンタ、外部記憶装置等）と、モニタ装置（M）103（CRT又はグラフィックディスプレイ等）と、情報を保存するメモリ104とを有する。モニタ装置（M）103は、視覚又は聴覚フォーマットのような人間に理解できるフォーマットでコンピュータ情報を提供する。システム100は、メディアアクセスコントローラ（MAC：media access controller）のようなネットワーク処理システム又はコンピュータシステムを含む複数の異なるシステムでもよい。 Referring to FIG. 1, a block diagram of a general computer system 100 in which the present invention can be used is illustrated. The computer system 100 includes a central processing unit (CPU) 101, an input / output device (I / O) 102 (keyboard, modem, printer, external storage device, etc.), and a monitor device (M) 103 (CRT or graphic display, etc.) And a memory 104 for storing information. The monitor device (M) 103 provides computer information in a human readable format such as a visual or auditory format. The system 100 may be a plurality of different systems including a network processing system such as a media access controller (MAC) or a computer system.

図２Ａを参照すると、本発明の実施例が使用され得る中央処理装置101Aのブロック図が図示されている。中央処理装置101Aは、図示のように結合されたマイクロプロセッサ201と、プログラム命令を保存するメインメモリ202と、ディスク記憶装置203を有する。マイクロプロセッサ201は、１つ以上の実行ユニット210と、少なくとも１つのキャッシュメモリ212と、キャッシュコントローラ214とを有する。マイクロプロセッサ201は、メインメモリ202へのアクセスを制御するための別々のメモリコントローラ216を有してもよい。この場合、メモリコントローラは、マイクロプロセッサ201の他の要素とメインメモリ202とをインタフェース接続する。理想的には、実行ユニット210は、低速メインメモリ202に直接アクセスする必要なく、キャッシュメモリ212にデータを読み／書きする。すなわち、実行ユニットがキャッシュメモリ212へのミスを回避することが望ましい。キャッシュメモリ212のサイズに物理的な制限が存在する。しかし、マルチプロセッサシステムでは、システムは、各プロセッサの内部のキャッシュメモリ212が論理的に共有され得るように設計可能である。他の実施例では、プロセッサ内の１つ以上の内部キャッシュメモリに加えて、１つ以上の外部キャッシュメモリが備えられ、マルチプロセッサシステムの相互接続ネットワークで複数のプロセッサにより論理的に共有されてもよい。 Referring to FIG. 2A, a block diagram of a central processing unit 101A in which embodiments of the present invention can be used is illustrated. The central processing unit 101A includes a microprocessor 201, a main memory 202 for storing program instructions, and a disk storage device 203, which are coupled as illustrated. The microprocessor 201 includes one or more execution units 210, at least one cache memory 212, and a cache controller 214. Microprocessor 201 may have a separate memory controller 216 for controlling access to main memory 202. In this case, the memory controller interfaces the other elements of the microprocessor 201 with the main memory 202. Ideally, the execution unit 210 reads / writes data to the cache memory 212 without having to access the low speed main memory 202 directly. That is, it is desirable for the execution unit to avoid mistakes in the cache memory 212. There is a physical limit on the size of the cache memory 212. However, in a multiprocessor system, the system can be designed such that the cache memory 212 within each processor can be logically shared. In another embodiment, in addition to one or more internal cache memories in a processor, one or more external cache memories are provided and may be logically shared by multiple processors in an interconnect network of a multiprocessor system. Good.

ディスク記憶装置203は、フロッピー（登録商標）ディスク、ZIPディスク、DVDディスク、ハードディスク、再書込可能光ディスク、フラッシュメモリ又はその他の不揮発性記憶装置でもよい。マイクロプロセッサ201及びディスク記憶装置203は、メモリバスでメモリ202に情報を読み書きすることができる。このように、マイクロプロセッサ201及びディスク記憶装置203は、プログラム実行中にメモリ202内のメモリ位置を変更することができる。ディスク記憶装置203がこれを直接行うために、直接メモリアクセスを備えたディスクコントローラを有し、そのディスクコントローラはメモリへの保存を実行可能であり、それによってコードを変更可能である。コントローラはメモリに直接アクセスすることができるため、直接メモリアクセス（DMA：Direct Memory Access）エージェントの一例である。メモリに情報を保存するために直接アクセスを有する他の装置もDMAエージェントである。メモリ202は、一般的にはダイナミック・ランダムアクセスメモリ（DRAM：dynamic random access memory）であるが、その他の形式の再書込可能記憶装置でもよい。 The disk storage device 203 may be a floppy disk, ZIP disk, DVD disk, hard disk, rewritable optical disk, flash memory, or other nonvolatile storage device. The microprocessor 201 and the disk storage device 203 can read / write information from / to the memory 202 via the memory bus. Thus, the microprocessor 201 and the disk storage device 203 can change the memory location in the memory 202 during program execution. In order to do this directly, the disk storage device 203 has a disk controller with direct memory access, which can perform a save to memory and thereby change the code. The controller is an example of a direct memory access (DMA) agent because it can directly access the memory. Other devices that have direct access to store information in memory are also DMA agents. The memory 202 is typically dynamic random access memory (DRAM), but may be other types of rewritable storage devices.

ディスク記憶装置203又はその他のソース（I/O装置102）に保存されたプログラムの初期実行時に、マイクロプロセッサ201は、ディスク記憶装置203又は他のソースに保存されたプログラム命令及びデータを読み取り、それらをメモリ202に書き込む。メモリ202内に保存されたプログラム命令の１つ以上のページ又はその一部は、命令キャッシュ（図３に図示せず）に保存するために、マイクロプロセッサ201により読み取られる（すなわち“フェッチ”される）。命令キャッシュに保存されたプログラム命令のいくつかは、マイクロプロセッサ201による実行のために、命令パイプライン（図示せず）に読み取られてもよい。メモリ202内に保存されたデータの１つ以上のページ又はその一部は、データキャッシュに保存するために、マイクロプロセッサ201により読み取られてもよい（すなわち“フェッチ”される）。他の実施例では、命令とデータとの双方が同じキャッシュメモリに保存されてもよい。 Upon initial execution of a program stored in disk storage device 203 or other source (I / O device 102), microprocessor 201 reads program instructions and data stored in disk storage device 203 or other source, and Is written into the memory 202. One or more pages of program instructions or portions thereof stored in memory 202 are read (ie, “fetched”) by microprocessor 201 for storage in an instruction cache (not shown in FIG. 3). ). Some of the program instructions stored in the instruction cache may be read into an instruction pipeline (not shown) for execution by the microprocessor 201. One or more pages of data stored in the memory 202 or portions thereof may be read (ie, “fetched”) by the microprocessor 201 for storage in the data cache. In other embodiments, both instructions and data may be stored in the same cache memory.

図２Ｂを参照すると、本発明の実施例が使用され得るマルチプロセッサシステム101Bのブロック図が図示されている。マルチプロセッサシステム101Bは、マルチプロセッサ中央処理装置でもよい。マルチプロセッサシステム101Bは、複数のプロセッサ201A-201Jを有する。複数のプロセッサ201A-201Jのそれぞれは、１つ以上の実行ユニット210A-201Nを有する。実行ユニットは、コアと呼ばれることもある。複数のプロセッサ201A-201Jのそれぞれは、１つ以上の実行ユニット210A-210Jに結合する１つ以上のレベルの内部キャッシュメモリ・スライス（CMS：cache memory slice）212A-212Mを更に有してもよい。複数のプロセッサ210A-210Jのそれぞれは、I/O装置及び／又はモニタ装置に結合してもよい。 Referring to FIG. 2B, a block diagram of a multiprocessor system 101B in which embodiments of the present invention can be used is illustrated. The multiprocessor system 101B may be a multiprocessor central processing unit. The multiprocessor system 101B has a plurality of processors 201A-201J. Each of the plurality of processors 201A-201J includes one or more execution units 210A-201N. An execution unit is sometimes called a core. Each of the plurality of processors 201A-201J may further include one or more levels of internal cache memory slices (CMS) 212A-212M coupled to one or more execution units 210A-210J. . Each of the plurality of processors 210A-210J may be coupled to an I / O device and / or a monitor device.

マルチプロセッサシステム101Bは、相互接続ネットワークを通じて相互に結合され、複数のプロセッサ201A-201Jに結合された１つ以上のレベルの外部キャッシュメモリ・スライス（CMS：cache memory slice）212A’-212L’を更に有する。マルチプロセッサシステム101Bは、相互接続ネットワーク250に結合された１つ以上のメインメモリ202A-202Kと、相互接続ネットワーク250に結合されたディスク記憶装置203とを更に有してもよい。 Multi-processor system 101B further includes one or more levels of external cache memory slices (CMS) 212A'-212L 'coupled to each other through an interconnect network and coupled to multiple processors 201A-201J. Have. Multiprocessor system 101B may further include one or more main memories 202A-202K coupled to interconnect network 250 and a disk storage device 203 coupled to interconnect network 250.

プロセッサ202A-201J、キャッシュメモリ・スライス212A’-212L’及びディスク記憶装置203は、メインメモリ202A-202Kに情報を直接読み書きしてもよい。すなわち、メインメモリ202A-202Kは、プロセッサ202A-201Jとキャッシュメモリ・スライス212A’-212L’とディスク記憶装置203とにより共有可能である。更に、相互接続ネットワーク250を通じて、プロセッサ202A-201Jとメインメモリ202A-202Kとキャッシュメモリ・スライス212A’-212L’とディスク記憶装置203との間で、メッセージが伝達されてもよい。相互接続ネットワーク250でメッセージングを使用することにより、マルチプロセッサシステム101Bで順序保存要求の順番の実行又は処理が提供されてもよい。 The processors 202A-201J, the cache memory slices 212A'-212L ', and the disk storage device 203 may directly read / write information from / to the main memories 202A-202K. That is, the main memories 202A-202K can be shared by the processors 202A-201J, the cache memory slices 212A'-212L ', and the disk storage device 203. Further, messages may be communicated between the processors 202A-201J, main memory 202A-202K, cache memory slices 212A'-212L ', and disk storage device 203 through the interconnect network 250. By using messaging in the interconnect network 250, the multi-processor system 101B may be provided with the execution or processing of the order of order preservation requests.

図３Ａを参照すると、マルチプロセッサシステム101Cのブロック図が図示されている。マルチプロセッサシステム101Cは、プライマリ相互接続ネットワーク300Aと、内部キャッシュメモリ312Aをそれぞれ有する複数のプロセッサ301A-301Jと、複数のプロセッサ301A-301Jと相互接続ネットワーク300Aとの間に結合された１つ以上の上位レベルキャッシュメモリ312Bと、相互接続ネットワーク300Aに結合されたキャッシュメモリ・スライス312Cのレベルと、相互接続ネットワーク300Aに結合されたキャッシュメモリ・スライス312Dの他のレベルと、セカンダリ相互接続ネットワーク300Bを通じてプライマリ相互接続ネットワーク300Aに結合されたキャッシュメモリ・スライス312Eのその他のレベルとを有してもよい。 Referring to FIG. 3A, a block diagram of a multiprocessor system 101C is shown. The multiprocessor system 101C includes a primary interconnect network 300A, a plurality of processors 301A-301J each having an internal cache memory 312A, and one or more processors coupled between the plurality of processors 301A-301J and the interconnect network 300A. Primary level through higher level cache memory 312B, level of cache memory slice 312C coupled to interconnect network 300A, other level of cache memory slice 312D coupled to interconnect network 300A, and secondary interconnect network 300B And other levels of cache memory slice 312E coupled to interconnect network 300A.

マルチプロセッサシステム101Cは、１つ以上のメインメモリ302A、302B及び／又は302Cを更に有してもよい。メインメモリ302Aは、プライマリ相互接続ネットワーク300Aに直接結合されてもよい。メインメモリ302Bは、セカンダリ相互接続ネットワーク300Bを通じてプライマリ相互接続ネットワーク300Aに結合されてもよい。メインメモリ302Cは、セカンダリ相互接続ネットワーク300Bを通じて低レベルキャッシュメモリ・スライス312Eとプライマリ相互接続ネットワーク300Aとに結合されてもよい。 The multiprocessor system 101C may further include one or more main memories 302A, 302B and / or 302C. Main memory 302A may be directly coupled to primary interconnect network 300A. Main memory 302B may be coupled to primary interconnect network 300A through secondary interconnect network 300B. Main memory 302C may be coupled to low level cache memory slice 312E and primary interconnect network 300A through secondary interconnect network 300B.

内部キャッシュメモリ312A、１つ以上の上位レベルキャッシュメモリ312B、キャッシュメモリ・スライス312Cのレベル、キャッシュメモリ・スライス312Dのレベル及びキャッシュメモリ・スライス312Eのレベルは、物理的に分散されたマルチレベルキャッシュメモリシステムの実施例を形成してもよい。物理的に分散されたマルチレベルメモリシステムの実施例は、キャッシュメモリ・スライスに含まれるメインメモリ302A、302Bを備える。 Internal cache memory 312A, one or more higher level cache memories 312B, cache memory slice 312C levels, cache memory slice 312D levels and cache memory slice 312E levels are physically distributed multi-level cache memory System embodiments may be formed. An embodiment of a physically distributed multi-level memory system comprises main memories 302A, 302B included in a cache memory slice.

プロセッサ、キャッシュメモリ・スライス及びメインメモリは、相互接続ネットワークのノードとして考えられてもよい。メッセージは、１つのノードから他のノードに相互接続ネットワークを通じて流れてもよく、１つのノードから他の全てのノードにブロードキャストされてもよい。マルチプロセッサシステム101C並びに相互接続ネットワーク300A及び300Bのトポロジは、バス型ネットワークトポロジ、ツリー型ネットワークトポロジ、リング型ネットワークトポロジ、グリッド若しくはメッシュ型ネットワークトポロジ、トーラス型ネットワークトポロジ、超立方体型ネットワークトポロジ、全接続型トポロジ、又はそれらの組み合わせでもよい。 The processor, cache memory slice and main memory may be considered as nodes of the interconnect network. Messages may flow from one node to other nodes through the interconnection network and may be broadcast from one node to all other nodes. The topology of the multiprocessor system 101C and the interconnection networks 300A and 300B includes a bus network topology, a tree network topology, a ring network topology, a grid or mesh network topology, a torus network topology, a hypercube network topology, and all connections. It may be a type topology or a combination thereof.

相互接続ネットワーク300A及び300Bは、集積回路で経路設定されたワイヤートレース（wire trace）、同じ集積回路で経路設定されたバス、及び／又は同じ集積回路の機能ブロック間の１つ以上のスイッチでもよい。代替として、相互接続ネットワーク300A及び300Bは、集積回路の間に経路設定されたワイヤートレース、集積回路の間のバス、及び／又は集積回路の間の１つ以上のスイッチでもよい。プライマリ相互接続ネットワーク300Aとセカンダリ相互接続ネットワーク300Bとを相互接続するために、スイッチ、ブリッジ又はルータ（図示せず）が使用されてもよく、それにより、メッセージがそれに従ってあちこちに通過してもよい。 Interconnect networks 300A and 300B may be one or more switches between wire traces routed in an integrated circuit, buses routed in the same integrated circuit, and / or functional blocks of the same integrated circuit. . Alternatively, interconnect networks 300A and 300B may be wire traces routed between integrated circuits, buses between integrated circuits, and / or one or more switches between integrated circuits. Switches, bridges or routers (not shown) may be used to interconnect the primary interconnect network 300A and the secondary interconnect network 300B, so that messages may be passed around accordingly. .

メッセージは相互接続ネットワークを通じて流れるため、ノードからノードに経路設定されるときに又はノードから全ノードに経路設定されるときに、異なる遅延を受けることがある。これらの異なる遅延は、メッセージ転送の無順序のシーケンスを引き起こすことがある。すなわち、相互接続ネットワークは、順番の保存要求を処理するときの無順序ネットワークである。 Because messages flow through the interconnect network, they may experience different delays when routed from node to node or when routed from node to all nodes. These different delays can cause an unordered sequence of message transfers. That is, the interconnection network is an unordered network when processing an order storage request.

図３Ｂを参照すると、マルチプロセッサシステム101C’のブロック図が図示されている。図３Ｂは、プライマリ相互接続ネットワーク300Aを含む図３Ａのシステム101C’の大部分が単一のモノリシック集積回路（IC：integrated circuit）チップ350の一部でもよいことを示している。すなわち、メインメモリ302Cを除いて、システム101Cの要素は、システム101C’に示すように単一のシリコンチップ350に併せて一体化されてもよい。 Referring to FIG. 3B, a block diagram of a multiprocessor system 101C 'is shown. FIG. 3B shows that the majority of the system 101C ′ of FIG. 3A, including the primary interconnect network 300A, may be part of a single monolithic integrated circuit (IC) chip 350. That is, except for the main memory 302C, the elements of the system 101C may be integrated into a single silicon chip 350 as shown in the system 101C '.

図３Ｃを参照すると、マルチプロセッサシステム101C’’のブロック図が図示されている。図３Ｃは、複数の集積回路（IC：integrated circuit）チップ360A-360Bの一部であるプライマリ相互接続ネットワーク300Aで集積回路境界を通じてシステム101Cが分割されてもよいことを示している。システム101Cの要素は、複数のシリコンチップに併せて一体化されてもよい。代替として、マルチプロセッサシステム101C’’の要素は、共通バックプレーン又はマザーボードのプリント基板（PCB：print circuit board）のトレース等を通じて相互に電気結合された１つ以上のプリント基板でもよい。 Referring to FIG. 3C, a block diagram of a multiprocessor system 101C '' is shown. FIG. 3C illustrates that system 101C may be partitioned across integrated circuit boundaries in a primary interconnect network 300A that is part of a plurality of integrated circuit (IC) chips 360A-360B. The elements of the system 101C may be integrated into a plurality of silicon chips. Alternatively, the elements of the multi-processor system 101C '' may be one or more printed circuit boards that are electrically coupled together, such as through a common backplane or printed circuit board (PCB) traces.

図４を参照すると、物理分散キャッシュメモリシステム400のブロック図の論理図が図示されている。物理分散キャッシュメモリシステム400は、ハッシュ・アドレスを生成するアドレス・ハッシュ制御ロジック404と、プライマリ相互接続ネットワーク300A又はプライマリ相互接続ネットワーク300A及びセカンダリ相互接続ネットワーク300Bでハッシュ・アドレスでのメッセージを受信する１つ以上のキャッシュメモリ・スライス412A-412Kとを有する。１つ以上のキャッシュメモリ・スライス412A-412Kのそれぞれは、メモリセル414A-414Kの１つ以上のブロックを有する。 Referring to FIG. 4, a logical diagram of a block diagram of a physical distributed cache memory system 400 is illustrated. The physical distributed cache memory system 400 receives the message with the hash address in the address hash control logic 404 for generating the hash address and the primary interconnection network 300A or the primary interconnection network 300A and the secondary interconnection network 300B. One or more cache memory slices 412A-412K. Each of the one or more cache memory slices 412A-412K has one or more blocks of memory cells 414A-414K.

物理分散キャッシュメモリシステム400は、プロセッサ又はキャッシュメモリのような要求側により共有される。メモリセルの１つのアドレスブロックが１つのキャッシュメモリ・スライスに関連付けられて、メモリセルの次のアドレスブロックが他のキャッシュメモリ・スライスに関連付けられるように、物理分散キャッシュメモリシステム400は複数の方法で分割されてもよい。要求側からのアドレス402は、キャッシュメモリ・スライスと、そのメモリセルの１つ以上のブロックとを選択するために、アドレス・ハッシュ・ロジック404によりハッシュされる。 The physical distributed cache memory system 400 is shared by requesters such as processors or cache memory. The physical distributed cache memory system 400 can be in a number of ways so that one address block of memory cells is associated with one cache memory slice and the next address block of memory cells is associated with another cache memory slice. It may be divided. Address 402 from the requester is hashed by address hash logic 404 to select a cache memory slice and one or more blocks of that memory cell.

図３Ａ及び４を参照すると、１つ以上のプロセッサ301A-301Jは、順序保存要求が物理分散キャッシュシステム400により実行されることを要求することができる。代替として、１つ以上のキャッシュメモリ312A、312B又は分散キャッシュメモリシステム400の階層の上位レベルのキャッシュメモリ・スライスは、順序保存要求が分散キャッシュメモリシステム400の他のレベルにより実行されることを要求することができる。このような要求を行うキャッシュメモリ・スライスは、一般的にプロセッサの近くである。このような要求を行うキャッシュメモリは、プロセッサの内部キャッシュメモリ312A又は上位レベルキャッシュメモリ312Bを有してもよい。順序保存を要求するプロセッサとキャッシュメモリとキャッシュメモリ・スライスとを併せて、要求側と呼ぶことがある。各要求側は、順序保存要求を生成する制御ロジック及び他のハードウェア要素を有する。以下の説明では、“Nc”は物理分散キャッシュを構成するキャッシュメモリ・スライスの数を表し、“Np”は分散キャッシュを共有する要求側の数を表す。 Referring to FIGS. 3A and 4, one or more processors 301A-301J may request that an order preservation request be executed by the physical distributed cache system 400. Alternatively, one or more cache memories 312A, 312B or a higher level cache memory slice in the hierarchy of distributed cache memory system 400 may require an order preservation request to be performed by another level of distributed cache memory system 400. can do. The cache memory slice that makes such a request is generally near the processor. The cache memory that makes such a request may include the internal cache memory 312A or the upper level cache memory 312B of the processor. The processor, cache memory, and cache memory slice that require order preservation may be collectively referred to as a requester. Each requester has control logic and other hardware elements that generate an order preservation request. In the following description, “Nc” represents the number of cache memory slices constituting the physical distributed cache, and “Np” represents the number of requesters sharing the distributed cache.

一時的に図７を参照すると、メモリ階層の１つのレベルのプロセッサ／キャッシュ要求側701が、順序保存要求の順番の実行を行うことができるメモリ階層の異なるレベルのキャッシュメモリ・スライス702A及び702Bと通信するように図示されている。 Referring temporarily to FIG. 7, the cache memory slices 702A and 702B at different levels of the memory hierarchy, in which the processor / cache requester 701 at one level of the memory hierarchy can execute the order of the order preservation requests, Shown to communicate.

各プロセッサ／キャッシュ要求側701は、“j”の一定値を有する固有の要求側識別子（“RID：requestor identifier”）704と、“t”の変数値を有する単一のトークン・レジスタ（“TR：token register”）706とを有する。固有の要求側識別子は、固有の要求側識別表示とも呼ばれることがある。トークン・レジスタは、シーケンス・トークン・レジスタと呼ばれることがあり、トークン値“t”はまた、保存シーケンス・トークン又は保存シーケンス番号と呼ばれることがある。トークン・レジスタ（“TR：token register”）706は、“b”ビット幅であり、プロセッサ／要求側によりサポートされる未解決の順序保存要求の数に応じて、2^bビット幅を有することができる。“S”は各プロセッサがサポートする未解決の順序保存要求を表すと仮定すると、トークン・レジスタのビット数は、“b”=ceiling[log₂(S)]の式から決定され得る。トークン・レジスタにより保持される値はまた、要求側シーケンス・トークンとも呼ばれることがある。トークン・レジスタは、順序保存要求が生成される毎に増加し得る。トークン・レジスタは、その最大値を超えて増加したときに、その初期値（一般的に0）にラップアラウンド（すなわちロールオーバー）して戻ることができる。しかし、一実施例では、最大ネットワーク待ち時間（すなわち最大ネットワーク遅延）に比較して、Sは十分に大きく、“b”のビット数も十分に大きい。それにより、トークン・レジスタがロールオーバーするときまでに、プロセッサは全てを処理している。他の実施例では、ロールオーバーしようとしているTRレジスタを備えたプロセッサ／要求側は、各キャッシュメモリ・スライスをポーリングし、それぞれが全てのタグ付きメモリ要求を処理してS-1に到達したか否かを決定する。全てのキャッシュメモリ・スライスが終了したことをプロセッサに応答すると、プロセッサは、その所定のTRレジスタがロールオーバーすることを許可し得る。 Each processor / cache requester 701 has a unique requestor identifier (“RID”) 704 having a constant value of “j” and a single token register (“TR” having a variable value of “t”. : Token register ”) 706. The unique requester identifier may also be referred to as a unique requester identification. The token register may be referred to as a sequence token register, and the token value “t” may also be referred to as a stored sequence token or stored sequence number. The token register ("TR: token register") 706 is "b" bits wide and may have 2 ^b bits wide, depending on the number of outstanding order preservation requests supported by the processor / requester. it can. Assuming that “S” represents an outstanding order preservation request supported by each processor, the number of bits in the token register can be determined from the equation “b” = ceiling [log ₂ (S)]. The value held by the token register may also be referred to as the requester sequence token. The token register may be incremented each time a save order request is generated. When the token register increases beyond its maximum value, it can wrap around (ie, roll over) back to its initial value (generally 0). However, in one embodiment, S is sufficiently large and the number of bits in “b” is sufficiently large compared to the maximum network latency (ie, maximum network delay). Thereby, the processor is processing everything by the time the token register rolls over. In another embodiment, the processor / requestor with the TR register that is about to roll over polls each cache memory slice, and each has processed all tagged memory requests and has reached S-1 Decide whether or not. When responding to the processor that all cache memory slices are finished, the processor may allow that given TR register to roll over.

要求側識別子（“RID：requestor identifier”）704の値“j”は固有である。すなわち、要求側識別子の２つの値は、分散キャッシュメモリシステムでの同じマルチプロセッサシステムで同じではない。各要求側識別子（“RID”）704の値“j”が固有であるため、各レジスタ701のトークン・レジスタの値“t”は、“j”と“t”とを一緒に付加することにより、固有になり得る。すなわち、相互接続ネットワークで分散キャッシュメモリシステムに通信される前に、トークンに要求側識別子を追加することにより、トークン・レジスタTRを“固有”にすることができる。 The value “j” of the requesting identifier (“RID: requestor identifier”) 704 is unique. That is, the two values of the requester identifier are not the same in the same multiprocessor system in a distributed cache memory system. Since the value “j” of each requesting identifier (“RID”) 704 is unique, the token register value “t” of each register 701 is obtained by adding “j” and “t” together. Can be unique. That is, the token register TR can be made “unique” by adding a requestor identifier to the token before being communicated to the distributed cache memory system over the interconnect network.

順序保存要求の順番の実行を行うことができる分散キャッシュメモリのメモリ階層の各キャッシュメモリ・スライス（キャッシュメモリ・スライス702A及び702B等）は、キャッシュ・シーケンス・アレイ（CSA：cache sequence array）712を有する。キャッシュ・シーケンス・アレイ（CSA）712は、“b”ビット幅の“Np”のエントリを備えたテーブルである。キャッシュ・シーケンス・アレイ（CSA）712は、要求側識別子（“RID”）704毎に、分散キャッシュシステムの所定のキャッシュメモリ・スライスにより処理可能な次の順序保存の識別を決定する。Npの要求側が存在するため、キャッシュ・シーケンス・アレイ（CSA）にはNpのエントリが存在する。 Each cache memory slice (cache memory slices 702A and 702B, etc.) of the memory hierarchy of the distributed cache memory that can execute the order of the order saving request is provided with a cache sequence array (CSA) 712. Have. The cache sequence array (CSA) 712 is a table having entries of “Np” having a “b” bit width. The cache sequence array (CSA) 712 determines, for each requesting identifier (“RID”) 704, the next order-save identification that can be processed by a given cache memory slice of the distributed cache system. Since there are Np requesters, there are Np entries in the cache sequence array (CSA).

図５Ａ−５Ｂを参照する。図５Ａは、タグ付き順序保存要求500の一般的なフィールドの図を示している。図５Ｂは、CSA更新510の一般的なフィールドの図を示している。順序保存要求をサポートするために、要求側識別子（RID）フィールド501（“j”の値）と、トークン・レジスタ値フィールド502（“t”の値）と、メッセージ識別子（MID：message identifier）フィールド504とのビットフィールドが、タグ付き順序保存要求500とCSA更新510との双方で使用される。要求側識別子（RID）フィールド501（“j”の値）と、トークン・レジスタ値フィールド502（“t”の値）とのビットフィールドは、併せてTRU503と呼ばれることがある。すなわち、TRU504は、要求側id“j”と要求側jのトークン・レジスタTRの値“t”との連結を表す。TRU504の値は“j.t”として表示されることもできる。ただし、jは要求側識別子であり、“t”は要求側jのトークン・レジスタTRの値である。 Reference is made to FIGS. 5A-5B. FIG. 5A shows a general field diagram of a tagged sequence preservation request 500. FIG. 5B shows a general field diagram of CSA update 510. To support order preservation requests, a requester identifier (RID) field 501 (value of “j”), a token register value field 502 (value of “t”), and a message identifier (MID) message field The bit field 504 is used in both the tagged order storage request 500 and the CSA update 510. The bit field of the requesting identifier (RID) field 501 (value of “j”) and the token register value field 502 (value of “t”) may be collectively referred to as TRU 503. That is, the TRU 504 represents a connection between the request side id “j” and the value “t” of the token register TR of the request side j. The value of TRU 504 can also be displayed as “j.t”. Here, j is a request side identifier, and “t” is the value of the token register TR of the request side j.

メッセージ識別子（MID）フィールド504は、順序保存要求（OSR：ordered store request）504A又はCSA更新504Bを示すコードである。他のメッセージ形式を示すために、メッセージ識別子（MID）フィールド504の他のコードが使用されてもよい。 The message identifier (MID) field 504 is a code indicating an ordered store request (OSR) 504A or CSA update 504B. Other codes for message identifier (MID) field 504 may be used to indicate other message formats.

メッセージ識別子フィールド504がタグ付き保存要求500の順序保存要求（OSR）コード504Aを示す場合、アドレスフィールド505及びデータフィールド506がタグ付き順序保存要求500の一部として含まれる。換言すると、要求側識別子（RID）フィールド501（“j”の値）及びトークン・レジスタ値フィールド502（“t”の値）のビットフィールドは相互に連結され、保存されるアドレス505とデータ506とを含む順序保存要求コード504Aに追加される。このように、タグ付き順序保存要求500が形成される。 If the message identifier field 504 indicates an order preservation request (OSR) code 504A of the tagged preservation request 500, an address field 505 and a data field 506 are included as part of the tagged order preservation request 500. In other words, the requester identifier (RID) field 501 (value of “j”) and the bit field of the token register value field 502 (value of “t”) are concatenated together, and the stored address 505 and data 506 Is added to the order storage request code 504A. In this way, the tagged order storage request 500 is formed.

メッセージ識別子フィールド504がCSA更新コード504Bを示し、順序保存要求（OSR）コード504Aを示さない場合、アドレスフィールド505及びデータフィールド506は、分散キャッシュメモリシステム400に送信されるメッセージに含まれない。この場合、要求側識別子（RID）フィールド501（“j”の値）及びトークン・レジスタ値フィールド502（“t”の値）のビットフィールドは、処理された順序保存要求に基づき、CSA更新コード504Bに付加される。 If the message identifier field 504 indicates a CSA update code 504B and does not indicate an order preservation request (OSR) code 504A, the address field 505 and the data field 506 are not included in the message sent to the distributed cache memory system 400. In this case, the bit field of the request side identifier (RID) field 501 (value of “j”) and the token register value field 502 (value of “t”) is based on the processed order storage request and the CSA update code 504B. To be added.

一実施例では、タグ付き順序保存要求500及びCSA更新510のデータビットフィールドは、相互接続ネットワーク300A、300Bで、要求側からキャッシュメモリ・スライスに又は１つのキャッシュメモリ・スライスから他のキャッシュメモリ・スライスにパケットで流れてもよい。他の実施例では、タグ付き順序保存要求500及びCSA更新510のデータビットフィールドは、相互接続ネットワークの並列相互接続バスで並列に流れてもよい。他の実施例では、タグ付き順序保存要求510及びCSA更新510のデータビットフィールドは、相互接続ネットワークの直列相互接続で連続的に流れてもよい。更に他の実施例では、タグ付き順序保存要求500及びCSA更新510のデータビットフィールドは、相互接続ネットワークで並列又は直列の１つ以上のパケットの組み合わせにより流れてもよい。その他に、タグ付き順序保存要求500は要求側により生成されて相互接続ネットワークに送信され、CSA更新510は順序保存要求を実行したキャッシュメモリ・スライスにより生成されて相互接続ネットワークに送信される。 In one embodiment, the tagged bit order save request 500 and the data bit field of the CSA update 510 are transmitted from the requestor to the cache memory slice or from one cache memory slice to another cache memory slice in the interconnect network 300A, 300B. A packet may flow to the slice. In other embodiments, the tagged sequence save request 500 and the data bit field of the CSA update 510 may flow in parallel on the parallel interconnect bus of the interconnect network. In other embodiments, the tagged order preservation request 510 and the data bit field of the CSA update 510 may flow continuously over the serial interconnection of the interconnection network. In yet other embodiments, the tagged order preservation request 500 and the data bit field of the CSA update 510 may flow in a combination of one or more packets in parallel or serial in the interconnect network. In addition, a tagged order save request 500 is generated by the requester and sent to the interconnect network, and a CSA update 510 is generated by the cache memory slice that performed the order save request and sent to the interconnect network.

図６Ａを参照すると、キャッシュメモリ・スライス602のブロック図が図示されている。キャッシュメモリ・スライス602は、キャッシュメモリ・スライスの単一のインスタンスを示している。キャッシュメモリ・スライス602は、キャッシュ・シーケンス・アレイ604と、順序保存要求の順番の実行をサポートするキャッシュ制御ロジック606とを有する。キャッシュ制御ロジック606はまた、キャッシュメモリに関連する一般的なキャッシュ制御機能を提供することができる。キャッシュメモリ・スライス602は、図６Ａに図示するように相互に結合された要求バッファ608と、キャッシュ・タグ・ビット610と、キャッシュ・データ・アレイ612と、タグ照合ロジック614と、列選択616とを更に有する。 With reference to FIG. 6A, a block diagram of a cache memory slice 602 is illustrated. Cache memory slice 602 shows a single instance of a cache memory slice. The cache memory slice 602 includes a cache sequence array 604 and cache control logic 606 that supports execution of the order of order preservation requests. The cache control logic 606 can also provide general cache control functions associated with cache memory. Cache memory slice 602 includes request buffer 608, cache tag bits 610, cache data array 612, tag matching logic 614, column selection 616, coupled together as illustrated in FIG. 6A. It has further.

要求バッファ608は、キューでの処理のために、キャッシュ保存要求を一時的に保持する。一般的にキャッシュ・タグ・ビット610は、有効ビットと他の状態ビットと共にメモリセルのキャッシュラインの内容を特定するのに役立つ上位アドレスビットである。キャッシュ・データ・アレイ612は、データを保存するためのメモリセルの行及び列の配列である。タグ照合ロジック614は、所定のキャッシュメモリ・スライスにヒット又はミスが存在するか否かを決定する。ヒットは、所望のデータが所定のキャッシュメモリ・スライスのキャッシュ・データ・アレイ612内に保存されていることを示す。ミスは、所望のデータがキャッシュ・データ・アレイ612内に保存されておらず、要求が分散キャッシュメモリシステムの階層の次のレベルに渡される必要があることを示す。列選択616は、メモリセルの列がキャッシュ・データ・アレイ612から選択されるべきであるか否かについて、ヒット及びミスの指示に応じる。 The request buffer 608 temporarily holds a cache storage request for processing in the queue. Generally, the cache tag bit 610 is an upper address bit that helps to identify the contents of the cache line of the memory cell along with the valid bit and other status bits. Cache data array 612 is an array of memory cell rows and columns for storing data. Tag matching logic 614 determines whether there is a hit or miss in a given cache memory slice. A hit indicates that the desired data is stored in the cache data array 612 for a given cache memory slice. A miss indicates that the desired data is not stored in the cache data array 612 and the request needs to be passed to the next level in the hierarchy of the distributed cache memory system. Column selection 616 responds to hit and miss indications as to whether a column of memory cells should be selected from cache data array 612.

キャッシュ・シーケンス・アレイ604により、キャッシュメモリ・スライス602が物理分散キャッシュメモリシステムを通じて順番に順序保存要求を実行することが可能になる。キャッシュ・シーケンス・アレイ604は、キャッシュ・シーケンスのエントリとして１つ以上の順序保存要求に関連する１つ以上の保存シーケンス・トークンを保存する。 The cache sequence array 604 allows the cache memory slice 602 to execute order preservation requests sequentially through the physical distributed cache memory system. The cache sequence array 604 stores one or more save sequence tokens associated with one or more save order requests as cache sequence entries.

図６Ｂを参照すると、キャッシュ・シーケンス・アレイ（CSA：cache sequence array）604のブロック図が図示されている。キャッシュ・シーケンス・アレイ（CSA）604は、キャッシュ・シーケンス・アレイ（CSA）テーブル632を有する。キャッシュ・シーケンス・アレイ（CSA）テーブル632は、要求側j毎に保存シーケンス・トークンtを保存する。要求側識別子jは、キャッシュ・シーケンス・アレイ（CSA）テーブル632へのアドレスとして機能し、それによってアドレス指定されたデータをキャッシュ制御ロジック606に送信する。 Referring to FIG. 6B, a block diagram of a cache sequence array (CSA) 604 is illustrated. The cache sequence array (CSA) 604 has a cache sequence array (CSA) table 632. A cache sequence array (CSA) table 632 stores a storage sequence token t for each requester j. The requester identifier j serves as an address to the cache sequence array (CSA) table 632, thereby sending the addressed data to the cache control logic 606.

キャッシュ・シーケンス・アレイ（CSA）テーブル632の各キャッシュ・シーケンスのエントリは、所定のキャッシュメモリ・スライスが所定の要求側jから順番に実行し得る現行の保存シーケンス・タグtを示す。順序保存要求がその要求側jのキャッシュ・シーケンス・エントリに一致する所定のキャッシュメモリ・スライス602にハッシュされている場合、キャッシュメモリ・スライスは順序保存要求を実行する。異なる順序保存要求がその要求側jのキャッシュ・シーケンス・エントリに一致しない所定のキャッシュメモリ・スライス602にハッシュされている場合、キャッシュメモリ・スライスは現時点で順序保存要求を実行しないが、その代わりに、後の処理のために要求バッファ608又は他のキューにそれを保持する。このように、順序保存要求は順番に実行可能である。 Each cache sequence entry in the cache sequence array (CSA) table 632 indicates a current save sequence tag t that a given cache memory slice can execute sequentially from a given requester j. If the save order request has been hashed to a given cache memory slice 602 that matches the requesting j's cache sequence entry, the cache memory slice performs the save order request. If a different save order request is hashed to a given cache memory slice 602 that does not match that requester j's cache sequence entry, the cache memory slice currently does not execute the save order request, but instead Hold it in the request buffer 608 or other queue for later processing. In this way, the order storage request can be executed in order.

キャッシュメモリ・スライス内のキャッシュ・シーケンス・アレイ（CSA）テーブル632は、要求側毎に１つの順序保存要求エントリを維持する。このように、各キャッシュメモリ・スライスは、要求側j毎に順序保存要求の実行順序を維持することができる。 A cache sequence array (CSA) table 632 in the cache memory slice maintains one order preservation request entry for each requester. As described above, each cache memory slice can maintain the execution order of the order saving request for each requesting side j.

図７を参照すると、順序保存要求の順番の実行の例示的なシーケンスを示すブロック図は、タグ付き順序保存要求を生成し、順序保存要求を実行し、キャッシュ・シーケンス更新メッセージを発行することを有する。 Referring to FIG. 7, a block diagram illustrating an exemplary sequence of order execution of a save order request generates a tagged save order request, executes the save order request, and issues a cache sequence update message. Have.

前述のように、各プロセッサ／キャッシュ要求側701は、“j”の値を有する固有の要求側識別子（“RID”）704と、“t”の値を有する単一のトークン・レジスタ（“TR”）706とを有する。各プロセッサ／キャッシュ要求側701は、順序保存要求（例えば、ST.REL A、ST.REL B）を保存するワークキュー707と、タグ付き順序保存要求500の生成を制御して、その中のメモリセル及び適切なキャッシュメモリ・スライスを選択するためにアドレスをハッシュ又は変換する制御ロジック708とを更に有する。 As described above, each processor / cache requester 701 has a unique requestor identifier (“RID”) 704 having a value of “j” and a single token register (“TR” having a value of “t”. )) 706. Each processor / cache requester 701 controls the generation of a work queue 707 for storing an order storage request (for example, ST.REL A, ST.REL B) and a tagged order storage request 500, and the memory therein Control logic 708 is further included for hashing or translating the address to select the cell and the appropriate cache memory slice.

前述のように、順序保存要求の順番の実行を行うことができる分散キャッシュメモリのメモリ階層の各キャッシュメモリ・スライスは、キャッシュ・シーケンス・アレイ（CSA）を有する。図７は、キャッシュ・シーケンス・アレイ（CSA）712を有するキャッシュメモリ・スライスk702A及びキャッシュメモリ・スライスm702Bを示している。 As described above, each cache memory slice in the memory hierarchy of the distributed cache memory capable of executing the order of the order preservation request has a cache sequence array (CSA). FIG. 7 shows a cache memory slice k702A and a cache memory slice m702B having a cache sequence array (CSA) 712.

動作中に、要求側j701は、要求側IDjと現行のトークン・レジスタ値tとを付加して、キュー707内にある順序保存要求の１つのアドレスを使用して、タグ付き順序保存要求500を生成する。要求側j701のコントローラ708は、タグ付き順序保存要求500を発行する。順序保存要求は値“j.t”でタグ付けされる。時間Xにおいて、矢印721で示すように、ST.REL Aのタグ付き順序保存要求はキャッシュ・スライスk702Aに送信される。要求側j701において、ST.REL Aのタグ付き順序保存要求がキャッシュ・スライスk702Aに送信された後に、トークン・レジスタ706が(t+1)の値に増加する。 In operation, requester j701 appends requestor IDj and the current token register value t, and uses one address of the save order request in queue 707 to issue tagged save order request 500. Generate. The controller 708 of the requesting j701 issues a tagged order storage request 500. The order save request is tagged with the value “j.t”. At time X, as indicated by the arrow 721, the ST.REL A tagged order storage request is transmitted to the cache slice k702A. At the requesting side j701, after the ST.REL A tagged order storage request is transmitted to the cache slice k702A, the token register 706 is increased to the value of (t + 1).

例えば、要求側j701が、物理分散キャッシュメモリシステムへのタグ付き順序保存要求として異なるアドレス“A”及び“B”にタグ付けして発行する準備ができている“ST.REL A”及び“ST.REL B”で示す２つの順序保存要求を有することを仮定する。順序保存要求“ST.REL A”は順序保存要求“ST.REL B”より古く、順番の実行を実現するために最初に処理されなければならない。しかし、異なるアドレス“A”及び“B”を用いて、２つの順序保存要求“ST.REL A”及び“ST.REL B”は異なる部分（物理分散キャッシュメモリシステムのキャッシュメモリ・スライスk702A及びキャッシュメモリ・スライスm702B）により処理される。 For example, “ST.REL A” and “ST” where the requesting side j701 is ready to tag and issue different addresses “A” and “B” as a tagged sequence storage request to the physical distributed cache memory system Suppose we have two order preservation requests denoted by .REL B ". The order preservation request “ST.REL A” is older than the order preservation request “ST.REL B” and must be processed first to realize the execution of the order. However, using different addresses “A” and “B”, the two order storage requests “ST.REL A” and “ST.REL B” are different parts (cache memory slice k702A and cache of physical distributed cache memory system). Processed by memory slice m702B).

矢印721で示すように、まず、要求側j701は、“j.t”でタグ付けされた順序保存要求“ST.REL A”でキャッシュメモリ・スライスk702Aにタグ付き順序保存要求を発行する。キャッシュメモリ・スライスk702Aがこのタグ付き順序保存要求を処理すると、保存を実行して更新を実行する。すなわち、矢印722で示すように、キャッシュメモリ・スライスk702Aは、他の全てのキャッシュメモリ・スライスに“j.x”を有するキャッシュ・シーケンス・アレイ（CSA）更新をブロードキャストする。値x=t+1である。キャッシュメモリ・スライスk702Aは、そのCSA更新を実行するために要求側jに対応するその自分のCSA[j]のエントリを増加する。 As indicated by an arrow 721, first, the requesting side j701 issues a tagged order saving request to the cache memory slice k702A with the order saving request “ST.REL A” tagged with “j.t”. When the cache memory slice k702A processes the tagged order storage request, the cache memory slice k702A executes the storage and updates. That is, as indicated by arrow 722, cache memory slice k702A broadcasts a cache sequence array (CSA) update having “j.x” to all other cache memory slices. The value x = t + 1. Cache memory slice k702A increments its own CSA [j] entry corresponding to requestor j to perform its CSA update.

ST.REL Aのタグ付き順序保存要求を受信すると、キャッシュメモリ・スライスk702Aは、順番にタグ付き順序保存要求を実行することができるか否かを決定する。これを行うために、キャッシュ・スライスk702Aは、そのキャッシュ・シーケンス・アレイ（CSA）712と、要求側j701のエントリとを調べる。キャッシュメモリ・スライスk702Aが順番にタグ付き順序保存要求を実行することができるか否かを決定する方法は、図９Ａ及び９Ｂを参照して以下に更に説明する。キャッシュメモリ・スライスk702Aが順番にタグ付き順序保存要求を実行することができることを決定することを仮定すると、それはこれを行う。キャッシュメモリ・スライスk702AがST.REL Aのタグ付き順序保存要求を処理又は実行した後に、tの値は(t+1)に増加して要求側IDjを付与され、値j.t+1で他の全てのキャッシュメモリ・スライスに対するCSA更新510を生成及び発行する。矢印722は、CSA更新がキャッシュメモリ・スライスm702Bを含む他の全てのキャッシュメモリ・スライスに送信されることを示している。これは、CSA更新を受信したキャッシュが“j.t+1”の値を有するタグ付き順序保存要求を処理することができることを示している。 When the ST.REL A tagged order storage request is received, the cache memory slice k702A determines whether the tagged order storage request can be executed in order. To do this, cache slice k702A examines its cache sequence array (CSA) 712 and the requester j701 entry. The method for determining whether the cache memory slice k702A can execute the tagged order storage request in turn is described further below with reference to FIGS. 9A and 9B. Assuming that the cache memory slice k702A decides that it can execute the tagged sequence save request in turn, it does this. After the cache memory slice k702A processes or executes the ST.REL A tagged order storage request, the value of t increases to (t + 1) and is given the requester IDj, with the value j.t + 1 Generate and issue CSA updates 510 for all other cache memory slices. Arrow 722 indicates that the CSA update is sent to all other cache memory slices including cache memory slice m702B. This indicates that the cache that received the CSA update can process the tagged order storage request having a value of “j.t + 1”.

タグ付き順序保存要求は、様々な理由で所定のキャッシュメモリ・スライスでバラバラの順番になる。例えば、CSA更新が所定のキャッシュメモリ・スライスで適時に受信されなかったため、タグ付き順序保存要求がバラバラの順番になる可能性がある。他の例として、前の順序保存要求が完全に処理されてCSA更新が発行される前に他のタグ付き順序保存要求が発行されたため、タグ付き順序保存要求がバラバラの順番になる可能性がある。更に他の例として、全てのCSA更新が所定の要求側jの他のキャッシュメモリ・スライスから受信される前に後のタグ付き順序保存要求が受信されたため、タグ付き順序保存要求がバラバラの順番になる可能性がある。 Tagged order storage requests are in a disjoint order in a given cache memory slice for various reasons. For example, because a CSA update has not been received in a given cache memory slice in a timely manner, tagged order storage requests can be out of order. As another example, a tagged sequence preservation request could be out of order because another tagged order preservation request was issued before the previous order preservation request was fully processed and the CSA update was issued. is there. As yet another example, since a later tagged sequence save request was received before all CSA updates were received from other cache memory slices of a given requester j, There is a possibility.

例を続けると、時間X+e（eは正数である）に、矢印723で示すように、要求側jは、“j.(t+1)”でタグ付けされた順序保存要求“ST.REL B”を含むタグ付き順序保存要求をキャッシュメモリ・スライスm702Bに発行する。要求側j701において、ST.REL Bのタグ付き順序保存要求がキャッシュ・スライスm702Bに送信された後に、トークン・レジスタ706が(t+2)の値に増加する。 Continuing with the example, at time X + e (e is a positive number), as indicated by the arrow 723, the requesting side j requests an order storage request “ST” tagged with “j. (T + 1)”. A tagged order storage request including “.REL B” is issued to the cache memory slice m702B. At the requesting side j701, after the ST.REL B tagged order saving request is transmitted to the cache slice m702B, the token register 706 increases to the value of (t + 2).

ST.REL Bのタグ付き順序保存要求を受信すると、キャッシュメモリ・スライスm702Bは、順番にタグ付き順序保存要求を実行することができるか否かを決定する。これを行うために、キャッシュ・スライスm702Bは、そのキャッシュ・シーケンス・アレイ（CSA）712と、要求側j701のエントリとを調べる。 When the ST.REL B tagged order saving request is received, the cache memory slice m702B determines whether the tagged order saving request can be executed in order. To do this, the cache slice m702B examines its cache sequence array (CSA) 712 and the requester j701 entry.

キャッシュメモリ・スライスm702Bは、CSA712の要求側jのCSA[j]のそのエントリが(t+1)に等しいか否かを確認する。この場合に、キャッシュメモリ・スライスk702Aが順序保存要求“ST.REL A”を既に処理しており、キャッシュメモリ・スライス702Bmが対応のCSA更新を受信したことを仮定すると、キャッシュメモリ・スライス702Bm（300）は、要求側jのCSA[j]のそのエントリが(t+1)に等しいため、“ST.REL B”を処理することができる。 The cache memory slice m702B checks whether the entry of CSA [j] on the requesting side j of the CSA 712 is equal to (t + 1). In this case, assuming that the cache memory slice k702A has already processed the order preservation request “ST.REL A” and the cache memory slice 702Bm has received the corresponding CSA update, the cache memory slice 702Bm ( 300) can process “ST.REL B” because the entry in CSA [j] of requesting side j is equal to (t + 1).

しかし、キャッシュメモリ・スライスk702AからのCSA更新がキャッシュメモリ・スライスm702Bに到達する前に、要求側jが順序保存要求“ST.REL B”を発行することを仮定すると、キャッシュメモリ・スライスm702Bは、トークン“j.x”を備えたCSA更新が到達するまで、“ST.REL B”を含むタグ付き順序保存要求をネットワーク又はローカルバッファに保持し続ける。この場合、CSA更新はバラバラの順番になり、キャッシュメモリ・スライスは、処理用に有する順序保存要求を適切に処理しなければならない。 However, assuming that the requester j issues an order preservation request “ST.REL B” before the CSA update from the cache memory slice k702A reaches the cache memory slice m702B, the cache memory slice m702B Until the CSA update with the token “jx” arrives, keep the tagged order storage request including “ST.REL B” in the network or local buffer. In this case, the CSA updates are out of order and the cache memory slice must properly process the order preservation requests it has for processing.

キャッシュメモリ・スライスm702Bが順番にタグ付き順序保存要求を実行することができることを決定することを仮定すると、それはこれを行う。キャッシュメモリ・スライスm702BがST.REL Bのタグ付き順序保存要求を処理又は実行した後に、(t+1)の値は(t+2)に増加して要求側IDjを付与され、値j.t+2で他の全てのキャッシュメモリ・スライスに対するCSA更新510を生成及び発行する。矢印724は、CSA更新がキャッシュメモリ・スライスm702Aを含む他の全てのキャッシュメモリ・スライスに送信されることを示している。これは、CSA更新を受信したキャッシュが“j.t+2”の値を有するタグ付き順序保存要求を処理することができることを示している。 Assuming that the cache memory slice m702B decides that it can execute the tagged sequence save request in turn, it does this. After the cache memory slice m702B processes or executes the ST.REL B tagged order storage request, the value of (t + 1) increases to (t + 2) and is given the requesting side IDj, and the value j. Generate and issue CSA update 510 for all other cache memory slices at t + 2. Arrow 724 indicates that the CSA update is sent to all other cache memory slices including cache memory slice m702A. This indicates that the cache that received the CSA update can process the tagged order storage request having the value of “j.t + 2”.

キャッシュメモリ・スライスm702Bが“j.t+1”の値でのCSA更新を受信していないことを仮定すると、“j.t+1”の値を備えたタグ付き順序保存要求を順番に実行することができない。キャッシュメモリ・スライスm702Bは、“j.t+1”の値を備えたタグ付き順序保存要求を実行する前に、“j.t+1”の値を備えたCSA更新を受信するまで待機しなければならない。 Assuming that the cache memory slice m702B has not received a CSA update with a value of “j.t + 1”, it executes in sequence the tagged order save request with the value of “j.t + 1” Can not do it. Cache memory slice m702B waits until it receives a CSA update with a value of “j.t + 1” before executing a tagged sequence save request with a value of “j.t + 1”. There must be.

図８を参照すると、タグ付き順序保存要求をサポートするために要求側j701の制御ロジック708により実行される制御機能のフローチャートが図示されている。前述のように、この説明では、数字“Np”は分散キャッシュを共有するプロセッサの数を表す。 Referring to FIG. 8, a flowchart of control functions performed by the control logic 708 of the requesting j701 to support tagged order storage requests is shown. As described above, in this description, the number “Np” represents the number of processors sharing the distributed cache.

800において、システムは初期化又はリセットされる。802において、全てのプロセッサ及びキャッシュ要求側j701は、そのトークン・レジスタTR706及びトークン値“t”を初期値（0等）に設定する。以下に更に説明するように、各キャッシュメモリ・スライスのキャッシュ・シーケンス・アレイ（キャッシュメモリ・スライス702A、702Bのキャッシュ・シーケンス・アレイ712等）の全てのエントリも同様に、“t”について同じ初期値（0等）に設定される
804において、制御ロジックは、要求側j701が順序保存要求を物理分散キャッシュメモリシステムに送信する準備ができているか否かを決定する。そうでない場合には、制御ロジックは804にループバックし、基本的に順序保存要求の発行を待機する。順序保存要求が処理のために物理分散キャッシュメモリシステムに送信された場合、制御ロジックは806に進む。 At 800, the system is initialized or reset. In 802, all the processors and the cache requesting side j701 set the token register TR706 and the token value “t” to an initial value (0 or the like). As described further below, all entries in each cache memory slice's cache sequence array (such as cache memory slices 702A and 702B's cache sequence array 712) will also have the same initial for "t". Set to a value (such as 0)
At 804, the control logic determines whether the requesting j701 is ready to send an order storage request to the physical distributed cache memory system. Otherwise, the control logic loops back to 804 and basically waits for an order save request to be issued. If the save order request is sent to the physical distributed cache memory system for processing, the control logic proceeds to 806.

806において、順序保存要求は、図５Ａに示すように、RID“j”501とトークン・レジスタ値“t”502とを含むTRUタグ503の現行の値でタグ付けされる。TRUタグ503の値は“j.t”として示される。次に制御ロジックは808に進む。 At 806, the save order request is tagged with the current value of the TRU tag 503 including RID “j” 501 and token register value “t” 502, as shown in FIG. 5A. The value of the TRU tag 503 is indicated as “j.t”. Control logic then proceeds to 808.

808において、要求側j701の制御ロジック708はトークン・レジスタTR706を増加し、それにより、tの現行の値は、次の順序保存要求で後に使用するためにt+1の値を割り当てられる。次に制御ロジックは810に進む。 At 808, control logic 708 of requesting j701 increments token register TR706 so that the current value of t is assigned a value of t + 1 for later use in the next save order request. Control logic then proceeds to 810.

810において、タグ付き順序保存要求500が物理分散キャッシュメモリシステムに発行される。タグ付き順序保存要求のアドレスはハッシュされ、タグ付き順序保存要求は適切なキャッシュメモリ・スライス（キャッシュメモリ・スライスk702A等）に送信される。 At 810, a tagged order storage request 500 is issued to the physical distributed cache memory system. The address of the tagged sequence save request is hashed and the tagged sequence save request is sent to the appropriate cache memory slice (such as cache memory slice k702A).

図９Ａは、タグ付き順序保存要求をサポートするために各キャッシュメモリ・スライスの制御ロジック714により実行される制御機能の第１のフローチャートである。図９Ｂは、タグ付き順序保存要求をサポートするために各キャッシュメモリ・スライスの制御ロジック714により実行される制御機能の第２のフローチャートである。 FIG. 9A is a first flowchart of the control functions performed by the control logic 714 of each cache memory slice to support tagged sequence preservation requests. FIG. 9B is a second flowchart of the control functions performed by the control logic 714 of each cache memory slice to support tagged sequence preservation requests.

図９Ａを参照すると、順序保存要求が処理可能であるか否かを決定することについて、各キャッシュメモリ・スライスの制御ロジック714により実行される制御機能のフローチャートが図示されている。900において、800で前述したように、システムが初期化又はリセットされる。902において、各キャッシュメモリ・スライスのキャッシュ・シーケンス・アレイ712の全てのエントリは、“t”の初期値（0等）に設定される。これは、各要求側j701がそのトークン・レジスタTR706について有する初期トークン値“t”と一致する。次に制御ロジックは904に進む。 Referring to FIG. 9A, a flowchart of control functions performed by the control logic 714 of each cache memory slice is illustrated for determining whether an order preservation request can be processed. At 900, the system is initialized or reset as previously described at 800. At 902, all entries in the cache sequence array 712 of each cache memory slice are set to an initial value of “t” (eg, 0). This matches the initial token value “t” that each requesting j701 has for its token register TR706. Control logic then proceeds to 904.

904において、各キャッシュメモリ・スライスの制御ロジックは、要求側からタグ付き順序保存要求501を受信しているか否かを決定する。そうでない場合、制御ロジックは904にループバックし、基本的にタグ付き順序保存要求の受信を待機する。タグ付き順序保存要求が処理のために受信された場合、制御ロジックは906に進む。 At 904, the control logic of each cache memory slice determines whether a tagged sequence save request 501 has been received from the requesting side. Otherwise, the control logic loops back to 904 and basically waits to receive a tagged sequence save request. If a tagged sequence save request is received for processing, control logic proceeds to 906.

906において、TRUタグj.tはタグ付き順序保存要求から抽出され、順序保存要求が所定のキャッシュメモリ・スライスにより処理可能であるか否かを決定する。受信した要求側識別子“j”の値について、キャッシュメモリ・スライスは順序保存要求の値（値CSA[j]（0の初期値を仮定すると、jは0〜(S-1)の値をとる））を行ったプロセッサのキャッシュ・シーケンス・エントリを読み取る。“S”が、各プロセッサがサポートする未解決の順序保存の数を表すことを思い出されたい。 At 906, the TRU tag j.t is extracted from the tagged order save request and determines whether the order save request can be processed by a given cache memory slice. With respect to the value of the received request side identifier “j”, the cache memory slice takes the value of the order saving request (value CSA [j] (j assumes a value of 0 to (S-1) assuming an initial value of 0) Read the cache sequence entry of the processor that performed)). Recall that “S” represents the number of outstanding sequence saves supported by each processor.

908において、要求側jのCSA[j]エントリ、予想シーケンス番号は、順序保存要求のタグの“t”の部分と比較される。CSA[j]が順序保存要求のタグの“t”の部分と一致する場合、要求は処理可能である。CSA[j]がタグの“t”の部分と等しくない場合、タグは一致せず、制御ロジックは913に進む。CSA[j]がタグの“t”の部分と等しい場合、タグは一致し、制御ロジックは912に進む。 In 908, the CSA [j] entry of the requesting side j and the expected sequence number are compared with the “t” portion of the tag of the order storage request. If CSA [j] matches the “t” portion of the tag in the order preservation request, the request can be processed. If CSA [j] is not equal to the “t” portion of the tag, the tag does not match and control logic proceeds to 913. If CSA [j] is equal to the “t” portion of the tag, the tag matches and control logic proceeds to 912.

913において、対応するタグ付き順序保存要求（タグを含む）が、後の処理のためにキャッシュの通常のワークキューに保存される。 At 913, the corresponding tagged sequence save request (including tags) is saved in the normal work queue of the cache for later processing.

912において、タグが一致する場合（CSA[j]=t）、キャッシュは順序保存要求を処理して914に進む。 If the tags match at 912 (CSA [j] = t), the cache processes the save order request and proceeds to 914.

914において、CSA[j]エントリが所定の要求側について増加し、制御ロジックは916に進む。 At 914, the CSA [j] entry is incremented for a given requester and control logic proceeds to 916.

916において、CSA更新が他の全てのキャッシュメモリ・スライスに発行される。タグ付き順序保存要求を処理した所定のキャッシュメモリ・スライスは、システムの他の全てのキャッシュメモリ・スライスにトークンj.(t+1)を発行し、（処理するものがある場合に）トークン値t+1に対応する要求側jからのメッセージを処理することができることを示す。要求側はまた、その自分のトークンを増加させ、CSA更新の後に何らかの一致する要求についてワークキューを検査する。 At 916, a CSA update is issued to all other cache memory slices. A given cache memory slice that has processed a tagged order save request issues token j. (T + 1) to all other cache memory slices in the system and token value (if any). Indicates that the message from the requesting party j corresponding to t + 1 can be processed. The requester also increments its token and checks the work queue for any matching requests after the CSA update.

ばらばらのCSA更新の処理について説明する。前述のように、物理分散キャッシュメモリ・スライスのネットワークは、順番に処理可能なように、タグ付き順序保存要求を並び替えることができる。しかし、物理分散キャッシュメモリ・スライスのネットワークはまた、同様に各キャッシュメモリ・スライスにより受信されたCSA更新も並び替えることができる。 A process of updating CSA separately will be described. As described above, the network of physical distributed cache memory slices can reorder tagged order storage requests so that they can be processed in order. However, the network of physical distributed cache memory slices can also reorder CSA updates received by each cache memory slice as well.

一時的に図７を参照して、例えばキャッシュメモリ・スライス701Aが近くのキャッシュに連続してタグ更新j.(t+1)及びj.(t+2)を含む２つのCSA更新を送出し、キャッシュメモリ・スライス702Bにばらばらの順番で到達したことについて検討する。更に、キャッシュメモリ・スライス702Bがタグ更新j.(t+1)及びj.(t+2)を含むCSA更新を待機する必要があるタグ付き順序保存要求を有するが、キャッシュメモリ・スライス702Aからタグ更新j.t+2を示す唯一のCSA更新を受信したことを仮定する。このシナリオでは、CSA更新は順序が狂っている。 Temporarily referring to FIG. 7, for example, cache memory slice 701A sends two CSA updates including tag updates j. (T + 1) and j. (T + 2) sequentially to a nearby cache. Consider that the cache memory slices 702B have been reached in a discrete order. In addition, cache memory slice 702B has a tagged ordered save request that needs to wait for a CSA update including tag updates j. (T + 1) and j. (T + 2), but from cache memory slice 702A Assume that a unique CSA update indicating tag update j.t + 2 has been received. In this scenario, the CSA updates are out of order.

キャッシュメモリ・スライスが狂った順序でCSA更新メッセージのタグ更新を受信した場合（タグ更新j.t+1、j.t+2、j.t+3及びj.t+4を受信する前にタグ更新j.t+5を受信した場合）、何らかの他のキャッシュメモリ・スライスは、j.t+2、j.t+3等を順番に生成してタグ更新j.t+5の発行を起動するために順番にj.t+1を受信する必要がある。従って、キャッシュメモリ・スライスが前の更新を受信せずにタグ更新j.t+nを受信した場合、タグ更新j.t+nの受信時に、キャッシュがj.t+nを含むそれまでの全ての順序保存を処理することが安全である。 When CSA update message tag updates are received in the order in which the cache memory slices are out of order (before receiving tag update j.t + 1, j.t + 2, j.t + 3 and j.t + 4) When tag update j.t + 5 is received), any other cache memory slice generates j.t + 2, j.t + 3, etc. in order and issues tag update j.t + 5 It is necessary to receive j.t + 1 in order to start. Therefore, if the cache memory slice receives a tag update j.t + n without receiving the previous update, when the tag update j.t + n is received, the cache will contain j.t + n It is safe to handle all sequence preservation.

t+nの更なる動作は2^bを法として実行される。ただし、bはタグのカウンタ部分のビット数である。カウンタが限られた数のbビットを有するため、更なる動作は、最大カウンタ値を超過して小さい値にロールオーバーしてもよい。ロールオーバー状態の否定的な影響を回避するように注意しなければならない。一実施例では、ビット“b”の数は、最大ネットワーク待ち時間（すなわち最大ネットワーク遅延）に比較して十分であり、それにより、トークン・レジスタがロールオーバーするときまでに、プロセッサは全ての前の順序保存要求を処理している。他の実施例では、ロールオーバーしてもよいTRレジスタを備えたプロセッサ／要求側は、各キャッシュメモリ・スライスにポーリングし、それぞれが全てのタグ付きメモリ要求を処理してS-1に到達しているか否かを決定する。全てのキャッシュメモリ・スライスが終了したとプロセッサに応答すると、プロセッサは、その所定のTRレジスタがロールオーバーすることを許可する。 Further operation of t + n is performed a 2 ^b modulo. Where b is the number of bits in the counter portion of the tag. Since the counter has a limited number of b bits, further operations may roll over to a smaller value beyond the maximum counter value. Care must be taken to avoid the negative effects of rollover conditions. In one embodiment, the number of bits “b” is sufficient compared to the maximum network latency (ie, the maximum network delay), so that by the time the token register rolls over, the processor Processing an order save request. In another embodiment, a processor / requestor with a TR register that may roll over polls each cache memory slice, each processing all tagged memory requests and reaching S-1. Determine whether or not. Upon responding to the processor that all cache memory slices have been completed, the processor allows that given TR register to roll over.

各要求側のTRカウンタは、限られた数のビット（“b”ビット）であり、従って、“b”ビットでタグを生成する。すなわち、最大カウンタ値及びタグの“t”は2^b-1である。 Each requesting TR counter has a limited number of bits ("b" bits) and therefore generates a tag with "b" bits. That is, the maximum counter value and the tag “t” are 2 ^b −1.

キャッシュメモリ・スライスk702Aがj.2^b-2のタグを備えたCSA更新を受信したが、j.0〜j.2^b-3のタグ値を備えたものを含むその他のCSA更新を受信していないことを仮定する。更に、キャッシュメモリ・スライスk702Aがその順序保存要求の全てを処理し、j.2^b-1のタグ値でCSA更新を送出することを仮定する。j.2^b-1のタグ更新を備えたCSA更新の発行は、他のキャッシュメモリ・スライス（キャッシュメモリ・スライスm702B等）がその他のCSA更新を待機せずにj.2^b-1を有するタグ付き順序保存要求を処理することを起動してもよい。その後、キャッシュメモリ・スライスm702Bは、タグj.0がj.2^b-1の次のカウンタ値であるため、j.0の更新タグを備えたCSA更新メッセージを発行してもよい。 Cache memory slice k702A receives a CSA update with a tag of j.2 ^b -2, but receives other CSA updates, including those with a tag value of j.0 to j.2 ^b -3 Assume that it is not. Further assume that cache memory slice k702A processes all of its order preservation requests and sends a CSA update with a tag value of j.2 ^b -1. issuance of CSA update with a tag update of J.2 ^b -1, the other cache memory slices (cache memory slices m702B etc.) has a J.2 ^b -1 without waiting other CSA update Processing of the tagged order storage request may be activated. Thereafter, since the tag j.0 is the next counter value after j.2 ^b −1, the cache memory slice m702B may issue a CSA update message including the update tag of j.0.

図９Ｂを参照すると、キャッシュメモリ・スライスの制御ロジックにより実行される制御機能のフローチャートが、タグ更新の処理について図示されている。 Referring to FIG. 9B, a flowchart of the control functions performed by the cache memory slice control logic is illustrated for the tag update process.

950において、各キャッシュメモリ・スライスの制御ロジックにより実行されるCSA更新ルーチンは、電源投入時又はリセット時に初期化される。 At 950, the CSA update routine executed by the control logic of each cache memory slice is initialized at power up or reset.

952において、制御ロジックは、キャッシュメモリ・スライスがタグ更新j.tを備えたCSA更新メッセージを受信しているか否かを決定する。そうでない場合、制御ロジックは952にループバックし、基本的にタグ更新の受信を待機する。タグ更新j.tを備えたCSA更新メッセージが受信されると、制御ロジックは972に進む。 At 952, the control logic determines whether the cache memory slice has received a CSA update message with tag update j.t. Otherwise, the control logic loops back to 952 and basically waits to receive tag updates. When a CSA update message with tag update j.t is received, control logic proceeds to 972.

972において、制御ロジックは、CSA[j]をtに等しく設定することにより、キャッシュ・シーケンス・アレイ・テーブルへの現行のエントリを更新する。次に974において、制御ロジックは、“j.t”のタグを備えた何らかの保留の順序保存要求をキャッシュメモリ・スライスに処理させる。順序保存要求を処理した後に、980において、制御ロジックは952に戻り、次の更新を受信する。 At 972, the control logic updates the current entry into the cache sequence array table by setting CSA [j] equal to t. Next, at 974, the control logic causes the cache memory slice to process any pending order preservation request with the tag “j.t”. After processing the save order request, at 980, the control logic returns to 952 to receive the next update.

特定の実施例について説明し、添付図面に図示したが、このような実施例は広い説明の単なる例示であり、限定ではないことがわかる。また、様々な他の変更が当業者に思いつくため、本発明は図示及び記載の特定の構成及び配置に限定されないことがわかる。例えば、本発明又はその特徴の一部をハードウェア、ファームウェア、ソフトウェア又はその組み合わせで実施することが可能である。その場合、ソフトウェアはプロセッサ読取可能記憶媒体（磁気、光又は半導体記憶装置等）に提供される。 While specific embodiments have been described and illustrated in the accompanying drawings, it will be understood that such embodiments are merely illustrative of the broad description and not limiting. In addition, since various other modifications will occur to those skilled in the art, it will be understood that the invention is not limited to the specific configurations and arrangements shown and described. For example, the present invention or some of its features can be implemented in hardware, firmware, software, or a combination thereof. In that case, the software is provided on a processor readable storage medium (such as magnetic, optical or semiconductor storage).

本発明が使用され得る一般的なコンピュータシステムのブロック図Block diagram of a general computer system in which the present invention may be used 本発明が使用され得る中央処理装置のブロック図Block diagram of a central processing unit in which the present invention may be used 本発明が使用され得るマルチプロセッサ中央処理装置のブロック図Block diagram of a multiprocessor central processing unit in which the present invention may be used. 本発明が使用され得るマルチプロセッサシステムの実施例のブロック図Block diagram of an embodiment of a multiprocessor system in which the present invention may be used 本発明が使用され得るマルチプロセッサシステムの他の実施例のブロック図Block diagram of another embodiment of a multiprocessor system in which the present invention may be used. 本発明が使用され得るマルチプロセッサシステムの他の実施例のブロック図Block diagram of another embodiment of a multiprocessor system in which the present invention may be used. 論理共有・物理分散キャッシュメモリシステムのブロック図Block diagram of logical shared / physical distributed cache memory system タグ付き順序保存要求の一般的なフィールドの図Illustration of the general fields of a tagged sequence save request CSA更新の一般的なフィールドの図CSA update general field diagram キャッシュメモリ・スライスのブロック図Cache memory slice block diagram キャッシュ・シーケンス・アレイ（CSA：cache sequence array）のブロック図Block diagram of cache sequence array (CSA) 順序保存要求の順番の実行の例示的なシーケンスのブロック図Block diagram of an exemplary sequence of execution of order preservation request order タグ付き順序保存要求をサポートするために要求側の制御ロジックにより実行される制御機能のフローチャートFlowchart of control functions performed by requesting control logic to support tagged sequence save requests 順序保存要求が処理可能か否かについて各キャッシュメモリ・スライスの制御ロジックにより実行される制御機能のフローチャートFlow chart of the control function executed by the control logic of each cache memory slice as to whether or not the order saving request can be processed タグ更新を処理する各キャッシュメモリ・スライスの制御ロジックにより実行される制御機能のフローチャートFlow chart of control functions executed by control logic of each cache memory slice that processes tag update

Claims

An interconnection network;
A first level cache memory slice coupled to the interconnect network for generating a tagged sequence storage request having a requester identification and a storage sequence token;
A physical distributed cache memory coupled to the interconnect network and having a second level cache memory slice for sequentially executing the order preservation request through the physical distributed cache memory system in response to each tag of the tagged order preservation request system.

The physical distributed cache memory system according to claim 1,
A physical distributed cache memory system in which one or more of the tagged ordered storage requests are received in a disjoint order by at least one of the second level cache memory slices.

The physical distributed cache memory system according to claim 1,
Each of the first level cache memory slices comprises a physical distributed cache memory system having a unique requester identification and a sequence token register that generates the tagged order preservation request.

The physical distributed cache memory system according to claim 1,
Each of the second level cache memory slices comprises a physical distributed cache memory system having a cache sequence array that sequentially executes the order preservation requests through the physical distributed cache memory system.

The physical distributed cache memory system according to claim 4,
The cache sequence array has a cache sequence array table that stores a save sequence token associated with a save order request as a cache sequence entry;
The physical distributed cache memory system, wherein the cache sequence entry indicates one or more order preservation requests that the cache memory slice can currently execute.

The physical distributed cache memory system according to claim 4,
Each of the second level cache memory slices further comprises control logic coupled to the cache sequence array;
The physical distributed cache memory system, wherein the control logic generates a cache sequence array that controls execution of the order of the order preservation request and updates the second level cache memory slice.

The physical distributed cache memory system according to claim 1,
Each of the first level cache memory slices is coupled to a processor and generates a tagged order preservation request.

The physical distributed cache memory system according to claim 7,
The physical distributed cache memory system, wherein the processor includes an internal cache memory that generates the tagged order storage request.

The physical distributed cache memory system according to claim 1,
A physical distributed cache memory system further comprising a higher level cache memory coupled to one or more processors.

The physical distributed cache memory system according to claim 9,
The physical distributed cache memory system, wherein the one or more processors include an internal cache memory that generates the tagged order storage request.

The physical distributed cache memory system according to claim 9,
The higher-level cache memory is a physically distributed cache memory system that generates the tagged order storage request.

A method for order preservation requests in a physical distributed cache memory system, comprising:
A tag is attached to the order storage request indicating the request side identification display and the storage sequence number,
Sending a tagged sequence save request to the cache memory slice of the physical distributed cache memory system;
Comparing the expected sequence number associated with the requester identification with the stored sequence number;
Executing the order preservation request if the preservation sequence number matches the expected sequence number;

The method of claim 12, comprising:
The method further comprising saving the tagged sequence save request for later execution if the saved sequence number does not match the expected sequence number.

The method of claim 12, comprising:
The method further comprising updating the expected sequence number associated with the requester identification in response to performing the order storage request.

The method of claim 12, comprising:
Before executing the order saving request, it is further determined whether a previous order saving request made by the requester is being executed, and if so, executing the order saving request. Method.

16. A method according to claim 15, comprising
The determining is whether a save sequence number associated with the previous order save request and the requester identification is received by the cache memory slice to execute the order save request. How to determine.

16. A method according to claim 15, comprising
A method in which execution of the current order save is delayed until all previous order save requests are executed when it is determined that all previous order save requests made by the requester have not been executed.

A data signal flow coupled to a cache memory for execution of an order of order preservation requests,
A requester identifier that indicates the requester that initiated the save order request; and
A token value indicating a sequence of the order storage request;
A data signal flow comprising: a request identifier and a message identifier indicating that the token value is available to be read by the cache memory.

A data signal flow according to claim 18, comprising:
The data signal flow further indicating that the request identifier and the token value are added to the order storage request, the message identifier.

A data signal flow according to claim 19,
The order storage request;
An address to store the data,
A data signal flow further comprising: the order storage request data to be stored;

A data signal flow according to claim 18, comprising:
The method of further indicating that the message identifier is for the requestor identifier and the token value to update a cache sequence array.

A plurality of processors each having one or more levels of processor cache memory;
A distributed cache memory system coupled to the plurality of processors and having a plurality of primary interconnect networks and a plurality of cache memory slices coupled to the primary interconnect networks;
A processing unit comprising: a plurality of cache memories coupled between the plurality of processors and the primary interconnect network of the distributed cache memory system;
The processor or cache memory has a unique requester identification and a sequence token register for generating a tag to be attached to an order preservation request of the plurality of cache memory slices of the distributed cache memory system, The tag has a requester identification and a sequence token associated with the requester identification;
Each cache memory slice includes a processing unit having a cache sequence array that sequentially executes the order preservation requests through the distributed cache memory system.

A processing unit according to claim 22,
A processing unit in which one or more of the order preservation requests of the plurality of cache memory slices are received in a disjoint order by at least one cache memory slice.

A processing unit according to claim 22,
The cache sequence array has a cache sequence array table that stores a save sequence token associated with a save order request as a cache sequence entry;
The cache sequence entry is a processing unit that indicates an order preservation request that the cache memory slice can currently execute.

25. A processing unit according to claim 24, comprising:
One cache memory slice of the plurality of cache memory slices receives a store order request having a sequence token that matches a predetermined requesting cache sequence entry indicating the order;
The one cache memory slice is a processing unit that executes a current order preservation request.

The processing unit according to claim 25, wherein
The one cache memory slice further updates the cache sequence entry of the plurality of cache memory slices.

25. A processing unit according to claim 24, comprising:
One of the plurality of cache memory slices receives a store order request having a sequence token that does not match a predetermined requesting cache sequence entry indicating the disjoint order;
The one cache memory slice is a processing unit that stores a current save order request for subsequent order execution with other save order requests.

An input / output device;
Dynamic random access memory,
A computer system comprising: said dynamic random access memory; and a multiprocessor processor coupled to said input / output device,
The multiprocessor processor is:
A plurality of processors each having one or more levels of processor cache memory;
A distributed cache memory system coupled to the plurality of processors and having an interconnect network and a plurality of cache memory slices coupled to the interconnect network;
A plurality of cache memories coupled between the plurality of processors and the interconnect network;
The processor or cache memory has a unique requester identification and a sequence token register for generating a tag to be attached to an order preservation request of the plurality of cache memory slices of the distributed cache memory system, The tag has a requester identification and a sequence token associated with the requester identification;
A computer system wherein each cache memory slice has a cache sequence array that sequentially executes the order preservation requests through the distributed cache memory system.

29. A computer system according to claim 28, comprising:
One of the plurality of cache memory slices receives a save order request having a sequence token that matches a predetermined requesting cache sequence entry indicating the order;
The one cache memory slice is a computer system that executes a current order preservation request.

30. A computer system according to claim 29, comprising:
The one cache memory slice further updates the cache sequence entry of the plurality of cache memory slices.

29. A computer system according to claim 28, comprising:
The order storage request is a computer system that is a request side storage request that requires execution of an order with respect to another order request on the request side.

A computer system according to claim 31, wherein
A computer system that requires processing prior save order requests before the current save order request can be processed.

A computer system according to claim 30, comprising
An unordered save request is a request side save request that can be executed out of order with respect to other order requests on the request side.

An interconnection network;
A plurality of processors coupled to the interconnect network and coupled to the interconnect network for tagged sequence storage requests each having a tag including a requester identification and a storage sequence token;
A multiprocessor having an integrated circuit coupled to the interconnect network and having a plurality of cache memory slices that sequentially execute the order preservation requests from the plurality of processors in response to each tag of the tagged order preservation request.

A multiprocessor according to claim 34, comprising:
The order storage request is a multiprocessor that is a storage request that requires execution of an order with respect to another order request on the requesting side.

36. The multiprocessor of claim 35, wherein
Multi-processors that require processing before a previous save order request can be processed before the current save order request can be processed.

36. The multiprocessor of claim 35, wherein
An unordered save request is a multiprocessor that is a save request that can be executed out of order with respect to other order requests.

A multiprocessor according to claim 34, comprising:
Each of the plurality of cache memory slices includes a cache sequence array that sequentially executes the order preservation request through the general connection network.

40. A multiprocessor according to claim 38, wherein
The cache sequence array has a cache sequence array table that stores a save sequence token associated with an order save request as a cache sequence entry;
The cache sequence entry is a multiprocessor indicating an order preservation request that the cache memory slice can currently execute.

A multiprocessor according to claim 34, comprising:
The plurality of cache memory slices are coupled to the interconnect network from a distributed cache memory system shared by the plurality of processors.