[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

EP1576465A1 - Counter based stride prediction for data prefetch - Google Patents

Counter based stride prediction for data prefetch

Info

Publication number
EP1576465A1
EP1576465A1 EP03778603A EP03778603A EP1576465A1 EP 1576465 A1 EP1576465 A1 EP 1576465A1 EP 03778603 A EP03778603 A EP 03778603A EP 03778603 A EP03778603 A EP 03778603A EP 1576465 A1 EP1576465 A1 EP 1576465A1
Authority
EP
European Patent Office
Prior art keywords
stride
count
equal
measure
memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP03778603A
Other languages
German (de)
French (fr)
Inventor
Jan Hoogerbrugge
Jan-Willem Van De Waerdt
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Koninklijke Philips NV
Original Assignee
Koninklijke Philips Electronics NV
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics NV filed Critical Koninklijke Philips Electronics NV
Publication of EP1576465A1 publication Critical patent/EP1576465A1/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/34Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes
    • G06F9/345Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes of multiple operands or results
    • G06F9/3455Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes of multiple operands or results using stride
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3824Operand accessing
    • G06F9/383Operand prefetching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3824Operand accessing
    • G06F9/383Operand prefetching
    • G06F9/3832Value prediction for operands; operand history buffers

Definitions

  • This invention relates to the field of electronics, and specifically to a method and system for predicting a next data location in a process to facilitate a prefetch of data from that location.
  • Prefetching is a common technique for minimizing latency in a sequential process.
  • Data that is expected to be needed at a future time in the process is retrieved from a memory and stored in a cache, for subsequent access at the future time.
  • the cache is designed to provide a substantially faster access time than the memory.
  • the process needs the data, and the data is in the cache, the data is provided to the process at the higher cache- access speed.
  • the data is not in the cache, the data is not provided to the process until after the substantially slower memory- access time, thereby introducing memory-access delays into the process.
  • a variety of techniques are commonly available to facilitate the prediction of the data that is going to be needed at a future time.
  • One such technique is "stride prediction", wherein the location of the next data item to be needed is based on the sequence of locations of the prior-accessed data items.
  • data is often stored as an array or list of data records, such as records of employee's names, addresses, social security number, and so on.
  • these records are fixed-length records, such that the beginning of each record is separated by a fixed number of memory locations from its adjacent records in the memory.
  • An application that provides a printout or display of all employee names, for example, will sequentially access memory locations that are separated by this fixed number.
  • the application accesses a first employee name from the memory at location L, and accesses a second employee name at location L+S, it is likely that the application will access data located at location L+S+S of the memory to obtain the next employee name. If the data at location L+S+S is retrieved from memory and stored in a higher-speed cache memory before this access is requested, the third employee name can be provided to the application more quickly than the first or second employee names.
  • FIG. 1 illustrates an example flow diagram of a prior art stride prediction prefetch process.
  • the prefetch process is invoked, typically at the same time that an application invokes a request for access to a new data item.
  • this prefetch process may be called as part of a global scheme that includes multiple prefetching algorithms, and may be selectively invoked depending upon factors such as the proximity of sequential data accesses, and the like.
  • a processor maintains a stride prediction table (SPT) that records information related to executed accesses to the memory, including the interval between sequential accesses, herein termed the stride of the accesses.
  • the stride prediction table is typically configured to record multiple sets of information related to executed accesses to keep track of multiple potential strides.
  • the invention is presented herein using the paradigm of a single set of information that is used to keep track of a single stride between related memory accesses.
  • One of ordinary skill in the art will recognize that the principles presented herein are directly applicable to the conventional use of a stride prediction table that includes multiple sets of information related to multiple strides.
  • the cu ⁇ ent stride is determined by the difference between the address of the prior/old access and the address of the new requested access, at 120.
  • a prefetch is executed to fetch the data at an address at the same interval from the cu ⁇ ent/new address, at 130.
  • the new address replaces the old address, in preparation for the next data access, and the prefetch routine terminates, at 150.
  • the prefetched data By initiating the prefetch for the next- likely data at the time of requested access to the new data, the prefetched data will be present in the higher-speed cache when and if the application initiates an access request for this data. If the next-likely data is not the data that is subsequently requested, the cache will not contain the next-requested data, and a memory-access will be required.
  • FIG. 2 illustrates an example flow diagram of an improved prior art stride prediction prefetch process, wherein a prefetch is initiated if and only if two successive accesses exhibit the same stride.
  • the determined stride for the new access request is compared to the prior determined stride. If the new stride equals the old stride, the likelihood that the next stride will also be equal to these two strides is sufficiently high to wa ⁇ ant a prefetch, at 130. If the new stride differs from the old stride, the new stride replaces the old stride, at 230, in preparation for the next cycle.
  • this two-in-a-row criteria for initiating a prefetch may be extended to three-in- a-row, four-in-a-row, and so on, to tradeoff between excessive memory-access traffic and the likelihood of having the next-requested data in cache. It is an object of this invention to improve the likelihood of prefetching data that will subsequently be accessed by an application. It is a further object of this invention to provide an efficient prefetch scheme that is well suited for hardware implementation.
  • a prefetching system that includes hysteresis in the determination and modification of a stride value that is used for prefetching data in a sequential process.
  • FIG. 1 illustrates an example flow diagram of a prior art stride prediction prefetch process.
  • FIG. 2 illustrates an example flow diagram of an alternative prior art stride prediction prefetch process.
  • FIG. 3 illustrates an example flow diagram of a stride prediction prefetch process in accordance with this invention.
  • FIG. 4 illustrates an example block diagram of a stride prediction prefetch system in accordance with this invention.
  • the same reference numerals indicate similar or corresponding features or functions.
  • a nested loop structure may include an inner loop that cycles through a list of data, and an outer loop that presets one or more variables that are used in each cycle of the inner loop, or an outer loop that stores a result from each cycle of the inner loop.
  • the inner loop that is cycling through the list of data will likely exhibit a constant stride.
  • the memory access is to the start of the list, whereas the prior access was to the end of the list.
  • the span between the end of the list and the start of the list will not co ⁇ espond to the inner loop's stride. Additionally, a data access by the outer loop at the begim ing or end of the loop will produce a span between accesses that does not co ⁇ espond to the inner loop's stride.
  • Other examples of an intermittent break in stride include the processing of data that is organized in a multi-dimension array. Typically, the data is processed for a given range along one dimension, then an index to another dimension is incremented, and the data at this next indexed other dimension is processed for the given range along the first dimension.
  • the stride along the range of the first dimension will generally be constant, but the increment of the index to the next dimension will likely result in an access having a span from the prior access that does not match the stride along the first dimension.
  • a conventional stride prediction process whenever the stride is 'broken' or interrupted, the process of determining the stride is repeated to determine a new stride. During the time that the new stride is being determined, prefetches do not occur, and the application is delayed by memory-accesses at each re-start of an inner loop, or each incrementing of a higher-level index during the processing of a multi-dimension array.
  • the stride value is preserved during intermittent breaks in stride.
  • Performing a prefetch of data at the cu ⁇ ent stride is dependent upon the number of equal- value strides within a given number of memory accesses, and adjusting the prefetch value is dependent upon the occu ⁇ ence of multiple non-equal-value strides.
  • a prefetch of data may be perfo ⁇ ned whenever two out of three accesses have successive equal strides, and a modification of the stride value may be performed whenever two accesses have successive unequal strides.
  • FIG. 3 illustrates an example flow diagram of a stride prediction prefetch process in accordance with this invention.
  • a "count" parameter is used to count the number of successive strides of the same value, up to a selected maximum.
  • this count parameter is also used to distinguish between intermittent breaks in stride and an actual, persistent, change of stride. It will be evident to one of ordinary skill in the art in view of this disclosure that an independent parameter could be employed to count the number of successive non-equal strides.
  • the process of this invention compares the cu ⁇ ent stride with the prior stride, at 220. If the cu ⁇ ent stride equals the prior stride, the count is incremented, at 326; if not, the cu ⁇ ent count is decremented, at 322. In a preferred embodiment, the count is limited to a value from zero to a maximum count. Blocks 324 and 328 clip the incremented or decremented count from blocks 322 and 326 to remain within these limits, respectively.
  • the count is compared to an upper limit, UL, and to a lower limit, LL, wherein the lower limit LL is preferably less than the upper limit UL.
  • the upper limit UL co ⁇ esponds to the number of equal-stride occu ⁇ ences required to wa ⁇ ant a prefetch from the next stride-predicted access location. If the count equals or exceeds this upper limit UL, a prefetch of data from an address co ⁇ esponding to the current address plus the cu ⁇ ent stride is executed, at 130. For example, if the upper limit UL is a value of two, the prefetch is not executed, initially, unless two-in-a-row of the same stride value occurs.
  • the prefetch is not executed until three-in-a-row of the same stride value occurs, and so on. Thereafter, subsequent equal-valued strides continue to increment the count, up to the maximum, via 326, 328.
  • the maximum value is selected as the number of successive equal-value strides required to conclude that the cu ⁇ ent stride value is 'reliable'.
  • the count is decremented, at 322.
  • the lower limit is selected as the value of the cu ⁇ ent count that constitutes a sufficient measure of unreliability to wa ⁇ ant a change to the cu ⁇ ent stride value, at 230.
  • a prefetch, at 130 is not performed.
  • the lower limit LL is set to be less than the upper limit UL, and generally set to be UL-1.
  • the lower limit LL may be selected relative to the upper limit UL such that the assessed reliability of the cu ⁇ ent stride is not sufficient to wa ⁇ ant a prefetch, at 130 (coun UL), while the assessed unreliability is not sufficient to wa ⁇ ant a modification of the cu ⁇ ent stride, at 230 (count>LL).
  • FIG. 4 illustrates an example block diagram of a stride prediction prefetch system 400 in accordance with this invention.
  • a fetch controller 430 effects prefetching of data from a memory 450 to a cache 460, based on the contents of a control register 410 and the data access requests from a processor 420.
  • the address of the requested data is used to determine whether the application is requesting data in a repetitive manner, exhibiting a consistent stride, or span of locations, between memory accesses, as discussed above.
  • the control register 410 includes an address of a prior data access 416, the prior stride 412, and a counter 414.
  • two counters may be used in lieu of the single counter 414, to maintain an independent count of equal-stride and non-equal stride accesses.
  • the address of the cu ⁇ ently requested data from the processor 420 is compared to prior address 416 to determine a cu ⁇ ent stride. If the cu ⁇ ent stride co ⁇ esponds to the prior stride 412, the counter 414 is incremented; otherwise, it is decremented. Depending upon the incremented or decremented value of the counter 414, the fetch controller determines whether to initiate a prefetch of data from a next-predicted location in the memory 450 to the cache 460. When a subsequent data request is for data that has been prefetched into the cache 460, the cache 460 provides the data directly to the processor, thereby avoiding the delay introduced by retrieving the data from the memory 450.
  • the fetch controller 430 determines whether to modify the stride value 412, as detailed above. By determining whether to modify the stride value 412 based on a count that depends upon the number of non-equal strides, rather than modifying the stride value based on the occu ⁇ ence of one non-equal stride as in a conventional system, the stride prediction prefetch system 400 is de-sensitized to intermittent breaks in stride.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Advance Control (AREA)

Abstract

A prefetching system (400) includes hysteresis in the determination and modification of a stride value (412) that is used for prefetching data (130) in a sequential process. Once a stride value is determined, intermittent stride inconsistencies are ignored (322-330), and the stride value retains its prior value. When the stride inconsistencies become frequent (322-330), the stride value is modified (230). When the modified stride value becomes repetitive, the system adopts this value as the stride, and subsequent stride inconsistencies are again ignored, and the stride value thereafter retains is current value until inconsistencies become frequent.

Description

COUNTER BASED STRIDE PREDICTION FOR DATA PREFETCH
This invention relates to the field of electronics, and specifically to a method and system for predicting a next data location in a process to facilitate a prefetch of data from that location.
Prefetching is a common technique for minimizing latency in a sequential process. Data that is expected to be needed at a future time in the process is retrieved from a memory and stored in a cache, for subsequent access at the future time. The cache is designed to provide a substantially faster access time than the memory. Thus, when the process needs the data, and the data is in the cache, the data is provided to the process at the higher cache- access speed. Conversely, if the data is not in the cache, the data is not provided to the process until after the substantially slower memory- access time, thereby introducing memory-access delays into the process.
A variety of techniques are commonly available to facilitate the prediction of the data that is going to be needed at a future time. One such technique is "stride prediction", wherein the location of the next data item to be needed is based on the sequence of locations of the prior-accessed data items. For example, data is often stored as an array or list of data records, such as records of employee's names, addresses, social security number, and so on. Typically, these records are fixed-length records, such that the beginning of each record is separated by a fixed number of memory locations from its adjacent records in the memory. An application that provides a printout or display of all employee names, for example, will sequentially access memory locations that are separated by this fixed number. Given that' the application accesses a first employee name from the memory at location L, and accesses a second employee name at location L+S, it is likely that the application will access data located at location L+S+S of the memory to obtain the next employee name. If the data at location L+S+S is retrieved from memory and stored in a higher-speed cache memory before this access is requested, the third employee name can be provided to the application more quickly than the first or second employee names.
FIG. 1 illustrates an example flow diagram of a prior art stride prediction prefetch process. At 110, the prefetch process is invoked, typically at the same time that an application invokes a request for access to a new data item. Not illustrated, this prefetch process may be called as part of a global scheme that includes multiple prefetching algorithms, and may be selectively invoked depending upon factors such as the proximity of sequential data accesses, and the like. Typically, a processor maintains a stride prediction table (SPT) that records information related to executed accesses to the memory, including the interval between sequential accesses, herein termed the stride of the accesses. The stride prediction table is typically configured to record multiple sets of information related to executed accesses to keep track of multiple potential strides. For ease of convenience and understanding, the invention is presented herein using the paradigm of a single set of information that is used to keep track of a single stride between related memory accesses. One of ordinary skill in the art will recognize that the principles presented herein are directly applicable to the conventional use of a stride prediction table that includes multiple sets of information related to multiple strides. At 110, the cuπent stride is determined by the difference between the address of the prior/old access and the address of the new requested access, at 120. On the assumption that a next requested access will be equally spaced as the prior access, a prefetch is executed to fetch the data at an address at the same interval from the cuπent/new address, at 130. At 140, the new address replaces the old address, in preparation for the next data access, and the prefetch routine terminates, at 150. By initiating the prefetch for the next- likely data at the time of requested access to the new data, the prefetched data will be present in the higher-speed cache when and if the application initiates an access request for this data. If the next-likely data is not the data that is subsequently requested, the cache will not contain the next-requested data, and a memory-access will be required. The prefetch process of FIG. 1, however, initiates a prefetch on every access to new data, without regard to whether there is any basis for the assumption that the next-likely requested data will be equally spaced from the prior-requested data. This causes substantial memory access traffic, which can serve to substantially reduce the effective memory access. FIG. 2 illustrates an example flow diagram of an improved prior art stride prediction prefetch process, wherein a prefetch is initiated if and only if two successive accesses exhibit the same stride. In this embodiment, at 220, the determined stride for the new access request is compared to the prior determined stride. If the new stride equals the old stride, the likelihood that the next stride will also be equal to these two strides is sufficiently high to waπant a prefetch, at 130. If the new stride differs from the old stride, the new stride replaces the old stride, at 230, in preparation for the next cycle. One of ordinary skill in the art will recognize that this two-in-a-row criteria for initiating a prefetch may be extended to three-in- a-row, four-in-a-row, and so on, to tradeoff between excessive memory-access traffic and the likelihood of having the next-requested data in cache. It is an object of this invention to improve the likelihood of prefetching data that will subsequently be accessed by an application. It is a further object of this invention to provide an efficient prefetch scheme that is well suited for hardware implementation.
These objects and others are achieved by a prefetching system that includes hysteresis in the determination and modification of a stride value that is used for prefetching data in a sequential process. Once a stride value is determined, intermittent stride inconsistencies are ignored, and the stride value retains its prior value. When the stride inconsistencies become frequent, the stride value is modified. When the modified stride value becomes repetitive, the system adopts this value as the stride, and subsequent stride inconsistencies are again ignored, and the stride value thereafter retains is current value until inconsistencies become frequent.
FIG. 1 illustrates an example flow diagram of a prior art stride prediction prefetch process.
FIG. 2 illustrates an example flow diagram of an alternative prior art stride prediction prefetch process.
FIG. 3 illustrates an example flow diagram of a stride prediction prefetch process in accordance with this invention.
FIG. 4 illustrates an example block diagram of a stride prediction prefetch system in accordance with this invention. Throughout the drawings, the same reference numerals indicate similar or corresponding features or functions.
This invention is premised on the observation that regular stride patterns in memory accesses often include intermittent data accesses that do not confonn to the stride pattern. For example, a nested loop structure may include an inner loop that cycles through a list of data, and an outer loop that presets one or more variables that are used in each cycle of the inner loop, or an outer loop that stores a result from each cycle of the inner loop. In this example, the inner loop that is cycling through the list of data will likely exhibit a constant stride. At each re-commencement of the inner loop, however, the memory access is to the start of the list, whereas the prior access was to the end of the list. The span between the end of the list and the start of the list, however, will not coπespond to the inner loop's stride. Additionally, a data access by the outer loop at the begim ing or end of the loop will produce a span between accesses that does not coπespond to the inner loop's stride. Other examples of an intermittent break in stride include the processing of data that is organized in a multi-dimension array. Typically, the data is processed for a given range along one dimension, then an index to another dimension is incremented, and the data at this next indexed other dimension is processed for the given range along the first dimension. The stride along the range of the first dimension will generally be constant, but the increment of the index to the next dimension will likely result in an access having a span from the prior access that does not match the stride along the first dimension.
In a conventional stride prediction process, whenever the stride is 'broken' or interrupted, the process of determining the stride is repeated to determine a new stride. During the time that the new stride is being determined, prefetches do not occur, and the application is delayed by memory-accesses at each re-start of an inner loop, or each incrementing of a higher-level index during the processing of a multi-dimension array.
In accordance with this invention, the stride value is preserved during intermittent breaks in stride. Performing a prefetch of data at the cuπent stride is dependent upon the number of equal- value strides within a given number of memory accesses, and adjusting the prefetch value is dependent upon the occuπence of multiple non-equal-value strides. In a simple example, a prefetch of data may be perfoπned whenever two out of three accesses have successive equal strides, and a modification of the stride value may be performed whenever two accesses have successive unequal strides.
FIG. 3 illustrates an example flow diagram of a stride prediction prefetch process in accordance with this invention. In this example, a "count" parameter is used to count the number of successive strides of the same value, up to a selected maximum. In a prefeπed embodiment of this invention, this count parameter is also used to distinguish between intermittent breaks in stride and an actual, persistent, change of stride. It will be evident to one of ordinary skill in the art in view of this disclosure that an independent parameter could be employed to count the number of successive non-equal strides.
After determining the current stride, between the new access address and the prior access address, at 120, the process of this invention compares the cuπent stride with the prior stride, at 220. If the cuπent stride equals the prior stride, the count is incremented, at 326; if not, the cuπent count is decremented, at 322. In a preferred embodiment, the count is limited to a value from zero to a maximum count. Blocks 324 and 328 clip the incremented or decremented count from blocks 322 and 326 to remain within these limits, respectively.
At 330, the count is compared to an upper limit, UL, and to a lower limit, LL, wherein the lower limit LL is preferably less than the upper limit UL. In this example, the upper limit UL coπesponds to the number of equal-stride occuπences required to waπant a prefetch from the next stride-predicted access location. If the count equals or exceeds this upper limit UL, a prefetch of data from an address coπesponding to the current address plus the cuπent stride is executed, at 130. For example, if the upper limit UL is a value of two, the prefetch is not executed, initially, unless two-in-a-row of the same stride value occurs. If the upper limit is a value of three, the prefetch is not executed until three-in-a-row of the same stride value occurs, and so on. Thereafter, subsequent equal-valued strides continue to increment the count, up to the maximum, via 326, 328. In a prefeπed embodiment, the maximum value is selected as the number of successive equal-value strides required to conclude that the cuπent stride value is 'reliable'.
Each time that a non-equal stride occurs, the count is decremented, at 322. Thus, the difference between the maximum count and the cuπent count coπesponds to a measure of 'unreliability' of the cuπent stride value. The lower limit is selected as the value of the cuπent count that constitutes a sufficient measure of unreliability to waπant a change to the cuπent stride value, at 230. Generally, as indicated in FIG. 3, if the current stride value is deemed unreliable, at 230, a prefetch, at 130, is not performed. In like manner, if the cuπent stride value is sufficiently reliable to waπant a prefetch, at 130, a modification of the cuπent stride value at 230 is not performed. Thus, the lower limit LL is set to be less than the upper limit UL, and generally set to be UL-1. Optionally, as indicated by the dashed connection between the test block 330 and the block 140, the lower limit LL may be selected relative to the upper limit UL such that the assessed reliability of the cuπent stride is not sufficient to waπant a prefetch, at 130 (coun UL), while the assessed unreliability is not sufficient to waπant a modification of the cuπent stride, at 230 (count>LL).
FIG. 4 illustrates an example block diagram of a stride prediction prefetch system 400 in accordance with this invention. A fetch controller 430 effects prefetching of data from a memory 450 to a cache 460, based on the contents of a control register 410 and the data access requests from a processor 420. The address of the requested data is used to determine whether the application is requesting data in a repetitive manner, exhibiting a consistent stride, or span of locations, between memory accesses, as discussed above. The control register 410 includes an address of a prior data access 416, the prior stride 412, and a counter 414. As noted above, two counters may be used in lieu of the single counter 414, to maintain an independent count of equal-stride and non-equal stride accesses. The address of the cuπently requested data from the processor 420 is compared to prior address 416 to determine a cuπent stride. If the cuπent stride coπesponds to the prior stride 412, the counter 414 is incremented; otherwise, it is decremented. Depending upon the incremented or decremented value of the counter 414, the fetch controller determines whether to initiate a prefetch of data from a next-predicted location in the memory 450 to the cache 460. When a subsequent data request is for data that has been prefetched into the cache 460, the cache 460 provides the data directly to the processor, thereby avoiding the delay introduced by retrieving the data from the memory 450.
Also dependent upon the incremented or decremented value of the counter 414, the fetch controller 430 determines whether to modify the stride value 412, as detailed above. By determining whether to modify the stride value 412 based on a count that depends upon the number of non-equal strides, rather than modifying the stride value based on the occuπence of one non-equal stride as in a conventional system, the stride prediction prefetch system 400 is de-sensitized to intermittent breaks in stride.
The foregoing merely illustrates the principles of the invention. It will thus be appreciated that those skilled in the art will be able to devise various aπangements which, although not explicitly described or shown herein, embody the principles of the invention and are thus within the spirit and scope of the following claims.

Claims

CLAMS What is claimed is:
1. A method of prefetching data from a memory (450) to a cache (460), comprising: determining (326) a first measure of equal-stride memory accesses, based on a plurality of equal-stride memory accesses and a prior stride value (412), determining (322) a second measure of non-equal-stride memory accesses, based on a plurality of non-equal- stride memory accesses and the prior stride value (412), effecting (130) a prefetch of the data from the memory (450) based on the first measure, and effecting (230) a modification of the prior stride value (412) based on the second measure.
2. The method of claim 1 , wherein determining the first measure and the second measure is effected by maintaining a count (414) that is increrhented (326) for each equal-stride memory access and decremented (322) for each non-equal-stride memory access, and effecting the prefetch and effecting the modification are each based (330) on the count (414).
3. The method of claim 2, wherein effecting (130) the prefetch occurs when (330) the count (414) is equal or above an upper limit, and effecting (230) the modification occurs when (330) the count (414) is equal or below a lower limit.
4. The method of claim 3, wherein the count (414) is limited to a maximum of three, the upper limit is two, and the lower limit is one.
5. A prefetch system (400) comprising: a control register (410) that is configured to contain at least one measure (414) that coπesponds to a consistency of stride values between requested memory accesses, a prefetch controller (430) that is configured to prefetch data from a memory (450) to a cache (460), based on the measure of consistency, wherein the consistency of stride values is dependent upon a comparison of a cuπent stride with a prior stride value (412), the prefetch controller (430) is further configured to modify the prior stride value (412) based on a measure of inconsistency, and the measure of inconsistency is based on a plurality of non-equal-strides between requested memory accesses.
6. The prefetch system (400) of claim 5, wherein the measure of consistency and the measure of inconsistency coπespond to a count (414) that is incremented upon each equal-stride requested memory access, up to a maximum count, and decremented upon each non-equal-stride requested memory access, down to a minimum count.
7. The prefetch system (400) of claim 6, wherein the prefetch controller (430) is configured to: prefetch the data when the count (414) is equal or above an upper threshold level, and modify the prior stride value (412) when the count (414) is equal or below a lower threshold level.
8. The prefetch system (400) of claim 7, wherein the maximum count is three, the upper threshold level is two, the lower threshold level is one, and the minimum count is zero.
9. A processing system, comprising: a memory (450) that is configured to provide access to data based on an access address, a cache (460), operably coupled to the memory (450), that is configured to store data that is accessed from the memory (450), to facilitate rapid access to the data, a processor (420), operably coupled to the memory (450) and the cache (460), that is configured to provide the access address and to receive the data from the cache (460), if it is stored in the cache (460), or from the memory (450), if it is not stored in the cache (460), and a fetch controller (430), operably coupled to the processor (420), the memory (450), and the cache (460), that is configured to effect a transfer of data from the memory (450) to the cache (460), based on the access address and a predicted stride value, wherein the fetch controller (430) is further configured to maintain a measure of stride consistency that is based on repeat occuπences of equal stride values, and a measure of stride inconsistency that is based on repeat occuπences of unequal stride values, and he fetch controller (430) effects the transfer of data based on the measure of stride consistency, and effects a modification of the predicted stride value based on the measure of stride inconsistency.
10. The processing system of claim 9, wherein the measure of stride consistency and the measure of stride inconsistency are each based on a count (414) that is incremented upon each equal-stride requested memory access, up to a maximum count, and decremented upon each non-equal-stride requested memory access, to a minimum count.
11. The processing system of claim 10, wherein the fetch controller (430) is configured to: effect the transfer of data when the count (414) is equal or above an upper threshold level, and effect the modification of the predicted stride value when the count (414) is equal or below a lower threshold level.
12. The processing system of claim 11, wherein the maximum count is three, the upper threshold level is two, the lower threshold level is one, and the minimum count is zero.
EP03778603A 2002-12-12 2003-12-09 Counter based stride prediction for data prefetch Withdrawn EP1576465A1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US43275502P 2002-12-12 2002-12-12
US432755P 2002-12-12
PCT/IB2003/005796 WO2004053686A1 (en) 2002-12-12 2003-12-09 Counter based stride prediction for data prefetch

Publications (1)

Publication Number Publication Date
EP1576465A1 true EP1576465A1 (en) 2005-09-21

Family

ID=32507990

Family Applications (1)

Application Number Title Priority Date Filing Date
EP03778603A Withdrawn EP1576465A1 (en) 2002-12-12 2003-12-09 Counter based stride prediction for data prefetch

Country Status (6)

Country Link
EP (1) EP1576465A1 (en)
JP (1) JP2006510082A (en)
KR (1) KR20050084232A (en)
CN (1) CN1726459A (en)
AU (1) AU2003285604A1 (en)
WO (1) WO2004053686A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9274965B2 (en) 2008-12-15 2016-03-01 International Business Machines Corporation Prefetching data

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4212521B2 (en) * 2004-06-30 2009-01-21 株式会社東芝 Prefetch control device, prefetch control method and program for controlling prefetch of data to temporary storage device of processor
JP2009230374A (en) 2008-03-21 2009-10-08 Fujitsu Ltd Information processor, program, and instruction sequence generation method
WO2011060570A1 (en) * 2009-11-17 2011-05-26 华为技术有限公司 High-speed counter processing method and counter
US8433852B2 (en) * 2010-08-30 2013-04-30 Intel Corporation Method and apparatus for fuzzy stride prefetch
US9253282B2 (en) 2011-10-18 2016-02-02 Qualcomm Incorporated Method and apparatus for generating, using, or updating an enriched user profile
CN102385622B (en) * 2011-10-25 2013-03-13 曙光信息产业(北京)有限公司 Pre-reading method for stride access mode of file system
US20140189249A1 (en) * 2012-12-28 2014-07-03 Futurewei Technologies, Inc. Software and Hardware Coordinated Prefetch
US10379864B2 (en) 2016-12-26 2019-08-13 Intel Corporation Processor prefetch throttling based on short streams
US11194728B2 (en) 2019-07-29 2021-12-07 Micron Technology, Inc. Memory-aware pre-fetching and cache bypassing systems and methods

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU7666796A (en) * 1996-11-04 1998-05-29 Advanced Micro Devices Inc. A stride-based data address prediction structure
DE10121792C2 (en) * 2000-05-26 2003-09-25 Ibm Universal loading address / value prediction scheme

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO2004053686A1 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9274965B2 (en) 2008-12-15 2016-03-01 International Business Machines Corporation Prefetching data
US10810125B2 (en) 2008-12-15 2020-10-20 International Business Machines Corporation Prefetching data

Also Published As

Publication number Publication date
JP2006510082A (en) 2006-03-23
AU2003285604A1 (en) 2004-06-30
WO2004053686A1 (en) 2004-06-24
CN1726459A (en) 2006-01-25
KR20050084232A (en) 2005-08-26

Similar Documents

Publication Publication Date Title
CN108459877B (en) Data processing
US6484239B1 (en) Prefetch queue
US7500063B2 (en) Method and apparatus for managing a cache memory in a mass-storage system
US6584549B2 (en) System and method for prefetching data into a cache based on miss distance
US6782454B1 (en) System and method for pre-fetching for pointer linked data structures
JP4060506B2 (en) Disk controller
EP1576465A1 (en) Counter based stride prediction for data prefetch
JPH0628180A (en) Prefetch buffer
US8356143B1 (en) Prefetch mechanism for bus master memory access
TWI259362B (en) Memory controller
JP4875981B2 (en) Prefetch control in data processing system
US20090217004A1 (en) Cache with prefetch
US7424562B2 (en) Intelligent PCI bridging consisting of prefetching data based upon descriptor data
WO2001063240A2 (en) Maintaining high snoop traffic throughput and preventing cache data eviction during an atomic operation
US20080104327A1 (en) Systems and Method for Improved Data Retrieval from Memory on Behalf of Bus Masters
JP4250989B2 (en) Memory access control device
US20060031633A1 (en) System method and circuit for retrieving data into a cache memory from a mass data storage device
US6286082B1 (en) Apparatus and method to prevent overwriting of modified cache entries prior to write back
US11086781B2 (en) Methods and apparatus for monitoring prefetcher accuracy information using a prefetch flag independently accessible from prefetch tag information
US6397304B1 (en) Method and apparatus for improving system performance in multiprocessor systems
JP2003044357A (en) Cache prefetch system
JPH10254775A (en) Memory controller having common cache memory
JPH06342401A (en) Secondary memory controller
JPH07219845A (en) Cache memory control system
JP2001154845A (en) Memory bus access control system after cache miss

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20050712

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LI LU MC NL PT RO SE SI SK TR

AX Request for extension of the european patent

Extension state: AL LT LV MK

DAX Request for extension of the european patent (deleted)
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20060307