JP2005519389A

JP2005519389A - How to prefetch data / instructions related to externally triggered events

Info

Publication number: JP2005519389A
Application number: JP2003573543A
Authority: JP
Inventors: ドエリング、アンドスリアス
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 2002-03-05
Filing date: 2003-02-27
Publication date: 2005-06-30
Also published as: AU2003221510A1; BR0308268A; MXPA04008502A; CA2478007A1; WO2003075154A3; KR20040101231A; AU2003221510A8; CN100345103C; WO2003075154A2; CN1698031A

Abstract

【課題】処理するデータ／命令を受信するための入力インタフェース（２０）、および処理された後にデータを送信するための出力インタフェース（２２）を有する基盤構造（１８）と、入力インタフェースによってデータ／命令を受信した場合にそれらを格納するためのメモリ（１４）と、少なくとも一部のデータ／命令を処理するためのプロセッサ（１０）とを含むシステムにおいて、外部でトリガされたイベントに関連するデータ／命令をプリフェッチする方法を提供する。
【解決手段】プロセッサは、データ／命令を処理する前に格納するキャッシュと、順次タスクをプロセッサに割り当てるための外部ソース（２６）とを有する。この方法は、プロセッサが以前のタスクを実行している間に行われるステップすなわち、プロセッサによって処理されるデータ／命令のメモリ内の位置を決定するステップと、これらのメモリ位置のアドレスをキャッシュに指示するステップと、メモリ位置の内容をフェッチし、それらをキャッシュ内に書き込むステップと、データ／命令を処理するタスクをプロセッサに割り当てるステップとを備える。An infrastructure (18) having an input interface (20) for receiving data / instructions to be processed and an output interface (22) for transmitting data after being processed, and data / instructions by means of the input interface In a system that includes a memory (14) for storing them when received and a processor (10) for processing at least some data / instructions, the data / A method for prefetching instructions is provided.
The processor has a cache for storing data / instructions prior to processing and an external source (26) for sequentially assigning tasks to the processor. The method includes steps performed while the processor is performing previous tasks, i.e., determining the locations in memory of data / instructions to be processed by the processor and directing the cache of the addresses of these memory locations. Fetching the contents of the memory locations and writing them into the cache, and assigning a task to process the data / instruction to the processor.

Description

本発明は、一般に、データ／命令が以前のタスクに関連しない場合に、ネットワーク・プロセッサにおけるスケジューラ（scheduler）等の外部ソースによって、タスクを処理するためプロセッサに割り込みを行うことができるシステムに関し、具体的には、外部でトリガされたイベントに関連したデータ／命令をプリフェッチ（prefetch）する方法に関する。 The present invention relates generally to a system that can interrupt a processor to process a task by an external source, such as a scheduler in a network processor, when data / instructions are not related to a previous task. In particular, it relates to a method for prefetching data / instructions associated with externally triggered events.

最新のマイクロプロセッサおよびマイクロプロセッサ・コアの有効性は、キャッシュ（cache）の有効性に強く依存する。なぜなら、命令サイクル時間はメモリ・アクセス時間よりもはるかに短いからである。キャッシュは、メモリ・アクセスの局所性、すなわち、メモリ・アクセスが以前のアクセスに近い確率が高いという事実を利用する。 The effectiveness of modern microprocessors and microprocessor cores depends strongly on the effectiveness of the cache. This is because the instruction cycle time is much shorter than the memory access time. The cache takes advantage of the locality of the memory access, ie the fact that the memory access is likely to be close to the previous access.

キャッシュは、選択した領域（キャッシュ・ライン）に新しい内容をロードするための機構すなわちキャッシュ・コントローラを含み、そうするために、古いエントリを破棄することでこのアクションのための空間を作る。キャッシュ・コントローラは、現在、キャッチ・プリフェッチ命令を用いてソフトウエアによって活性化することができる（例えばＰｏｗｅｒＰＣに準拠した全てのデバイスのData Cache Block Touch）。更に、線形ストライド（linear stride）またはリンクされたデータ構造のような通常のアクセス・パターンを認識するキャッシュ・コントローラに関する提案がある。しかしながら、この方法は、これらの状況において必要なメモリ内容が以前の処理に関連しない外部でトリガされたイベントには対応しない。かかる場合、必要なメモリ内容に関して唯一わかることは、割り込みソース、スケジューラ、またはタスクを割り当てる他のプロセッサのようなイベントのソースである。 The cache includes a mechanism or cache controller for loading new content into a selected area (cache line), and to do so, creates space for this action by discarding old entries. The cache controller can now be activated by software using a catch prefetch instruction (eg, Data Cache Block Touch for all devices compliant with PowerPC). In addition, there are proposals for cache controllers that recognize normal access patterns such as linear strides or linked data structures. However, this method does not accommodate externally triggered events where the memory content required in these situations is not related to previous processing. In such a case, the only thing known about the required memory content is the source of the event, such as an interrupt source, scheduler, or other processor that assigns tasks.

ネットワーク・プロセッサにおけるスケジューラ等の外部ソースが、以前の処理データに関連しないデータを処理するためにプロセッサに割り込みを行うことができるシステムでは、プロセッサが、キャッシュ・ミス（cache miss）を発生する。これは、プロセッサが必要とするデータがメモリからキャッシュ内にロードされるまでプロセッサが処理を停止することを意味する。これは、相当な長さの時間を無駄にする。このため、現在のメモリ技術および４００ＭＨｚのプロセッサ・クロック速度では、全てのキャッシュ・ミスは３６プロセッサ・クロック・サイクルに合致し、これは約４０の命令を意味する。現在の技術の傾向は、メモリのレイテンシよりもプロセッサ命令速度の向上が大きいので、キャッシュ・ミス当たりの失われる命令の数は増大する。 In systems where an external source, such as a scheduler in a network processor, can interrupt the processor to process data that is not related to previous processing data, the processor generates a cache miss. This means that the processor stops processing until the data it needs is loaded from memory into the cache. This wastes a considerable amount of time. Thus, with current memory technology and 400 MHz processor clock speed, every cache miss matches 36 processor clock cycles, which means about 40 instructions. Current technology trends increase processor instruction speed more than memory latency, so the number of instructions lost per cache miss increases.

従って、本発明の主な目的は、アドレスを容易に決定可能なデータに関するキャッシュ・ミスを回避するため、外部でトリガされたイベントに関連するデータ／命令をプリフェッチする方法を達成することである。 Accordingly, it is a primary object of the present invention to achieve a method for prefetching data / instructions related to externally triggered events to avoid cache misses for data whose addresses can be easily determined.

従って、本発明は、基盤構造であって、該基盤構造が処理するデータ／命令を受信するための入力インタフェースおよび処理された後にデータを送信するための出力インタフェースを有する基盤構造と、入力インタフェースによってデータ／命令を受信した場合にそれらを格納するためのメモリと、少なくとも一部のデータ／命令を処理するためのプロセッサと、を含むシステムにおいて、外部でトリガされたイベントに関連するデータ／命令をプリフェッチする方法に関し、プロセッサは、データ／命令を処理する前に格納するキャッシュと、順次タスクをプロセッサに割り当てるための外部ソースとを有する。この方法は、プロセッサが以前のタスクを実行している間に行われるステップすなわち、プロセッサによって処理されるデータ／命令のメモリ内の位置を決定するステップと、これらのメモリ位置のアドレスをキャッシュに指示するステップと、メモリ位置の内容をフェッチし、それらをキャッシュ内に書き込むステップと、データ／命令を処理するタスクをプロセッサに割り当てるステップと、を備える。 Accordingly, the present invention provides an infrastructure having an input interface for receiving data / instructions processed by the infrastructure and an output interface for transmitting data after being processed, and the input interface. Data / instructions related to externally triggered events in a system including a memory for storing data / instructions as they are received and a processor for processing at least some data / instructions With respect to the method of prefetching, the processor has a cache for storing data / instructions before processing and an external source for assigning sequential tasks to the processor. The method includes steps performed while the processor is performing previous tasks, i.e., determining the locations in memory of data / instructions to be processed by the processor and indicating the addresses of these memory locations to the cache. Fetching the contents of the memory locations and writing them into the cache, and assigning a task to process the data / instruction to the processor.

本発明の上述およびその他の目的、特徴、および利点は、添付図面と関連付けた本発明の以下の更に詳細な説明を読むことによって、より良く理解されよう。 The foregoing and other objects, features and advantages of the invention will be better understood by reading the following more detailed description of the invention in conjunction with the accompanying drawings.

かかるシステムは、データ／命令キャッシュを備えたＰｏｗｅｒＰＣプロセッサ・コア等のプロセッサ・コア１０を含む。このシステムは、プロセッサ・ローカル・バス（ＰＬＢ：Processor Local Bus）等の高性能バス１２によって構成され、このバスは、例えば全ての必要なタイミング、リフレッシュ信号等を発生することで、メモリからのバス・アーキテクチャの独立性を与えるメモリ・コントローラ１６を介して、データおよび命令を含む外部メモリ１４（例えばＡＳＤＲＡＭ）に接続する。 Such a system includes a processor core 10, such as a PowerPC processor core with a data / instruction cache. This system is constituted by a high-performance bus 12 such as a processor local bus (PLB), which generates, for example, all necessary timings, refresh signals, and the like, thereby generating a bus from a memory. Connects to external memory 14 (eg, A SDRAM) containing data and instructions via a memory controller 16 that provides architectural independence.

また、バス１２およびメモリ１４は、入力インタフェース２０上でネットワークから受信したデータ・パケットを処理する基盤構造（infrastructure）１８によって用いられる。基盤構造１８は、パケットの組み立て、メモリの割り当ておよび割り当て解除、ならびにパケット・キューからの挿入および削除を含めて、受信および送信を管理する。 The bus 12 and memory 14 are also used by an infrastructure 18 that processes data packets received from the network on the input interface 20. Infrastructure 18 manages reception and transmission, including packet assembly, memory allocation and deallocation, and insertion and deletion from the packet queue.

パケットの一部は処理する必要が無く、出力インタフェース２２によってネットワークを介して直接送信される。他のパケットは、プロセッサ１０によって処理しなければならない。パケットを処理する必要があるか、およびどの種類の処理を行わなければならないかの決定は、参照（look-up）および分類ユニット２４によって行われる。データ・パケットを処理するために、プロセッサ１０はいくつかの情報を必要とする。このため、これは、パケットのヘッダおよび基盤構造１８において発生した追加の情報にアクセスしなければならない。例えば、基盤構造は、パケットが到着することができるいくつかのポートを有する場合があり、プロセッサは、パケットがどこから来るかの情報を必要とする。 Some of the packets do not need to be processed and are sent directly over the network by the output interface 22. Other packets must be processed by the processor 10. The determination of whether a packet needs to be processed and what kind of processing should be done is made by the look-up and classification unit 24. In order to process the data packet, the processor 10 needs some information. Thus, it must have access to additional information generated in the packet header and infrastructure 18. For example, the infrastructure may have several ports through which packets can arrive and the processor needs information about where the packets come from.

スケジューラ２６は、１つまたはいくつかのキューにおいてプロセッサによる処理を必要とする全てのデータ・パケットを処理する。これらのキューは、スケジューラ内に物理的に存在する必要は無い。少なくとも、各キューの前部のエントリは、オンチップで格納しなければならない。このスケジューラは、プロセッサのアクティビティを追跡する。プロセッサがパケットの処理を終えると、これは、スケジューラに新しいタスクを要求する。しかしながら、異なる優先度を持ついくつかのキューを管理する場合、スケジューラ２６は、優先度の高いものの有利になるように、優先度の低いタスクのプロセッサによる処理に割り込みを行うことができる。 The scheduler 26 processes all data packets that require processing by the processor in one or several queues. These queues need not physically exist in the scheduler. At a minimum, the entry at the front of each queue must be stored on-chip. This scheduler tracks processor activity. When the processor finishes processing the packet, it requests a new task from the scheduler. However, when managing several queues with different priorities, the scheduler 26 can interrupt the processing of the lower priority tasks by the processor so as to benefit the higher priority.

いずれの場合でも、スケジューラ２６は、プロセッサ１０が処理しようとする次のタスクに関する知識を有する。選択されたタスクは、どのデータにアクセスするかを決定する。上述のようなネットワーク・プロセッサの場合、タスク（キューのエントリ）と最初にアクセスしたアドレス、すなわちパケット・ヘッダと追加情報との間の関係は、極めて簡単である。このキュー・エントリからアドレス集合への変換は、アドレス計算ユニットによって行われる。 In any case, the scheduler 26 has knowledge about the next task that the processor 10 is to process. The selected task determines what data is accessed. In the case of a network processor as described above, the relationship between a task (queue entry) and the first accessed address, ie the packet header and additional information, is very simple. The conversion from the queue entry to the address set is performed by the address calculation unit.

プロセッサ１０が新しいパケットを処理し、パケット・ヘッダ等のデータにアクセスすると、これは、通常、本発明を用いない場合にはキャッシュ・ミスを発生する。これが意味することは、必要なデータが外部メモリ１４からキャッシュ内にロードされるまでプロセッサは処理を停止し、これによって、すでに述べたように、相当な時間量が無駄になるということである。本発明によるキャッシュ・プリフェッチは、確実に起こり得る、アドレスを容易に決定可能なデータ・アクセスに関するキャッシュ・ミスを回避する。 When the processor 10 processes a new packet and accesses data such as a packet header, this usually causes a cache miss if the present invention is not used. This means that the processor stops processing until the necessary data is loaded from the external memory 14 into the cache, which, as already mentioned, wastes a considerable amount of time. Cache prefetching in accordance with the present invention avoids cache misses for data accesses that can be reliably determined and for which addresses can be easily determined.

機能するため、アクセスが起こる前に、キャッシュに命令して必要なデータをロードしなければならない。このアクションの開始は、キャッシュへの直接接続２８を用いるスケジューラから行われる。パケット・ヘッダおよび追加情報を格納したメモリ内の位置を決定した後で、スケジューラは、アドレスを割り当てて、キャッシュにフェッチし、キャッシュ・コントローラはメモリからキャッシュ内にデータをフェッチする。この書き込みの完了後、新しいタスクが以前のタスクよりも高い優先度を有する場合、スケジューラはプロセッサに割り込みを行って、新しいパケットを処理のために割り当て、または、スケジューラは、以前のタスクの完了を待った後で新しいパケットを送出する。 In order to function, the cache must be instructed to load the necessary data before the access occurs. This action is initiated from the scheduler using a direct connection 28 to the cache. After determining the location in memory that stores the packet header and additional information, the scheduler allocates an address and fetches it into the cache, and the cache controller fetches data from the memory into the cache. After this write is complete, if the new task has a higher priority than the previous task, the scheduler interrupts the processor and allocates a new packet for processing, or the scheduler can complete the previous task. Send a new packet after waiting.

本発明による方法は、図２に示すフローチャートによって表される。最初に、基盤構造は、新しいデータすなわち上述の例におけるデータ・パケットの受信を待つ（ステップ３０）。パケットのヘッダは、分類のため用いられ、参照および分類ユニットによる処理の結果得られたヘッダおよび追加情報は、外部メモリに格納される（ステップ３２）。参照および分類ユニットは、パケットをソフトウエアによって処理する必要があるか否かを判定し、その優先度を判定する（ステップ３４）。パケットが処理を必要としない場合、プロセスはループを戻って新しいデータの受信を待つ（ステップ３０）。 The method according to the invention is represented by the flowchart shown in FIG. Initially, the infrastructure waits for receipt of new data, ie the data packet in the above example (step 30). The header of the packet is used for classification, and the header and additional information obtained as a result of processing by the reference and classification unit are stored in external memory (step 32). The referencing and classification unit determines whether the packet needs to be processed by software and determines its priority (step 34). If the packet does not require processing, the process returns to wait for new data to be received (step 30).

データ・パケットが処理を必要とする場合、スケジューラは、プロセッサがアクセスするデータに対応するメモリ内のアドレスを計算しなければならない。この例では、これは、パケット・ヘッダのアドレスおよび、分類器の結果、入力ポート等の追加情報のアドレスである（ステップ３６）。次いで、これらのアドレスを、プロセッサのデータ・キャッシュ・コントローラに転送する（ステップ３８）。データ・キャッシュ・コントローラは、対応するデータをデータ・キャッシュ内に書き込む（ステップ４０）。これは、現在のパケット処理によって生成されるメモリ・アクセスをインターリーブすることによって行われる。 If a data packet requires processing, the scheduler must calculate an address in memory that corresponds to the data that the processor accesses. In this example, this is the address of the packet header and the address of additional information such as classifier results, input ports, etc. (step 36). These addresses are then transferred to the processor's data cache controller (step 38). The data cache controller writes the corresponding data into the data cache (step 40). This is done by interleaving memory accesses generated by current packet processing.

この段階で、プロセスは、到着したばかりのパケットが以前のものよりも高い優先度であるか否かによって決まる（ステップ４２）。その場合、スケジューラは、プロセッサによって現在行われている以前のタスクに割り込みを行い（ステップ４４）、処理のため新しいパケットを割り当て、プロセッサは、処理を開始し、キャッシュ内で関連データを見出す（ステップ４６）。新しいパケットが以前のものよりも高い優先度でない場合、プロセッサは、新しいものを処理する（ステップ４６）前に以前の処理を完了しなければならない（ステップ４８）。 At this stage, the process depends on whether the packet just arrived has a higher priority than the previous one (step 42). In that case, the scheduler interrupts the previous task currently being performed by the processor (step 44), allocates a new packet for processing, and the processor starts processing and finds the relevant data in the cache (step 46). If the new packet is not of higher priority than the previous one, the processor must complete the previous processing (step 48) before processing the new one (step 46).

高い優先度を有するパケットの場合、スケジューラは、プロセッサに割り込みを行う前にデータ・キャッシュ・フェッチの完了を待たなければならないことに留意するべきである。このため、スケジューラは、バス上のアクションを観察し、全ての割り当てアクセスが完了するまで待つことができる。あるいは、スケジューラは、固定された時間量だけ待ち、または、キャッシュ・コントローラからスケジューラへの直接フィードバックを用いることができる。 Note that for packets with high priority, the scheduler must wait for the data cache fetch to complete before interrupting the processor. Thus, the scheduler can observe the actions on the bus and wait until all assigned accesses are complete. Alternatively, the scheduler can wait for a fixed amount of time or use direct feedback from the cache controller to the scheduler.

また、キャッシュの接続されていない部分において、双方の場合に、２つのパケットが存在し、上述のように第２の優先度の高いパケットを処理するために第１のパケットの処理に割り込みを行うことが起こることに留意しなければならない。他の場合、プリフェッチされたデータは、それにアクセスする前にキャッシュから消去することができる。これは、プロセッサにおいて仮想アドレスから実アドレスへのマッピングを用いて達成可能である。なぜなら、キャシュは通常、仮想アドレスを用いてインデクスを付けるからである。 Further, in the case where the cache is not connected, in both cases, there are two packets, and as described above, the first packet processing is interrupted in order to process the second high priority packet. It must be noted that what happens. In other cases, prefetched data can be erased from the cache before accessing it. This can be achieved using virtual address to real address mapping in the processor. This is because a cache usually indexes using a virtual address.

本発明の方法について、ネットワーク・プロセッサ環境において説明してきたが、プロセッサによる何らかのデータに対するアクセスが確実に行われ、そのアドレスを容易に決定可能ないかなるシステムにおいてもこれを使用可能であることは、当業者には明らかであろう。あらゆる場合において、外部イベントは、処理対象の何らかのデータに結び付けられる。このため、一定の間隔で、新しいピクチャ（picture）が、ナビゲーションのためにカメラを用いるロボットに到着することを想定する。ピクチャの到着はイベントであり、ピクチャ・データ自体が、プリフェッチされる関連データである。 Although the method of the present invention has been described in a network processor environment, it should be noted that it can be used in any system where the processor has certain access to data and whose address can be easily determined. It will be clear to the contractor. In all cases, external events are tied to some data to be processed. For this reason, it is assumed that new pictures arrive at a robot using a camera for navigation at regular intervals. The arrival of a picture is an event, and the picture data itself is related data that is prefetched.

標準的なマイクロプロセッサでは、アドレス・バスを外部ソースとして使用可能であり、これはキャッシュ・コヒーレンシ（coherency）のためすでに観察されることに留意しなければならない。この場合、プリフェッチ要求を示すために必要な外部ワイヤは、１つのみである。 It should be noted that in a standard microprocessor, the address bus can be used as an external source, which has already been observed due to cache coherency. In this case, only one external wire is required to indicate a prefetch request.

本発明に従った方法を実施するネットワーク処理システムのブロック図である。1 is a block diagram of a network processing system implementing a method according to the present invention. 本発明による方法のステップを表すフローチャートである。4 is a flowchart representing the steps of the method according to the invention.

Claims

Infrastructure structure (18) having an input interface (20) for receiving data / instructions processed by the infrastructure structure and an output interface (22) for transmitting data after being processed And a memory (14) for storing data / instructions as they are received by the input interface, and a processor (10) for processing at least a portion of the data / instructions A method for prefetching data / instructions associated with an externally triggered event, wherein the processor stores a cache prior to processing the data / instructions and an external source for assigning sequential tasks to the processor (26)
The method is performed while the processor is performing a previous task, i.e.
Determining the location in the memory of data / instructions to be processed by the processor;
Directing the cache to the address of the memory location;
Fetching the contents of the memory locations and writing them into the cache;
Assigning a task to process the data / instruction to the processor;
A method comprising:

The method according to claim 1, wherein the processor (10) is a network processor and the processed data is in a header of a data packet received by the infrastructure (18).

The external source is a scheduler (26) directly connected to the cache in the processor (10), which determines the location in the memory (14) of data / instructions to be processed, and The method of claim 2, wherein an address is indicated directly to the cache.

The method of claim 3, wherein the scheduler (26) determines the location of the data / instruction in the memory (14) by calculating the address.

The step of assigning a task to process the prefetched data / instruction is characterized by interrupting processing of a previous packet and starting processing a new packet having a higher priority than the previous packet. Item 5. The method according to any one of Items 2 to 4.

The cache according to any one of claims 3 to 5, wherein the cache is associated with a cache controller that fetches the contents of the memory locations whose addresses are determined by the scheduler (26) and writes them into the cache. The method described in 1.

A processor local bus (PLB) is used by the processor (10) and the scheduler (26), which monitors the bus and observes exactly when the data returns from the memory (14). 6. The method of claim 5, wherein the processor is interrupted after determining completion of the data cache fetch.

Infrastructure structure (18) having an input interface (20) for receiving data / instructions processed by the infrastructure structure and an output interface (22) for transmitting data after being processed And a memory (14) for storing data / instructions as they are received by the input interface, and a processor (10) for processing at least a portion of the data / instructions A system for prefetching data / instructions associated with externally triggered events, wherein the processor stores a cache prior to processing the data / instructions and an external source for assigning sequential tasks to the processor (26)
The system comprises means that are performed while the processor is performing previous tasks, i.e.
Means for determining a location in the memory of data / instructions to be processed by the processor;
Means for indicating the address of the memory location to the cache;
Means for fetching the contents of the memory locations and writing them into the cache;
Means for assigning a task to process the data / instruction to the processor;
A system comprising:

The system of claim 8, wherein the processor (10) is a network processor and the data to be processed is in a header of a data packet received by the infrastructure (18).

The external source is a scheduler (26) directly connected to the cache in the processor (10), which determines the location in the memory (14) of data / instructions to be processed, and The system of claim 9, wherein an address is indicated directly to the cache.

The system of claim 10, wherein the scheduler (26) determines the location of the data / instruction in the memory (14) by calculating the address.

The means for assigning a task to process the prefetched data / instruction interrupts processing of a previous packet and starts processing a new packet having a higher priority than the previous packet. Item 12. The system according to any one of Items 9 to 11.

14. The cache according to any one of claims 11 to 13, wherein the cache is associated with a cache controller that fetches the contents of the memory locations whose addresses are determined by the scheduler (26) and writes them into the cache. The system described in.

A processor local bus (PLB) is used by the processor (10) and the scheduler (26), which monitors the bus and observes when the data returns from the memory (14). 13. The system of claim 12, wherein the processor is interrupted after determining completion of the data cache fetch.