EP1576465A1 - Counter based stride prediction for data prefetch - Google Patents
Counter based stride prediction for data prefetchInfo
- Publication number
- EP1576465A1 EP1576465A1 EP03778603A EP03778603A EP1576465A1 EP 1576465 A1 EP1576465 A1 EP 1576465A1 EP 03778603 A EP03778603 A EP 03778603A EP 03778603 A EP03778603 A EP 03778603A EP 1576465 A1 EP1576465 A1 EP 1576465A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- stride
- count
- equal
- measure
- memory
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000000034 method Methods 0.000 claims abstract description 29
- 230000004048 modification Effects 0.000 claims abstract description 10
- 238000012986 modification Methods 0.000 claims abstract description 10
- 230000000694 effects Effects 0.000 claims description 6
- 230000001419 dependent effect Effects 0.000 claims description 4
- 241000237519 Bivalvia Species 0.000 claims 1
- 235000020639 clam Nutrition 0.000 claims 1
- 230000003252 repetitive effect Effects 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 8
- 230000000977 initiatory effect Effects 0.000 description 2
- 230000001351 cycling effect Effects 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 230000001747 exhibiting effect Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/34—Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes
- G06F9/345—Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes of multiple operands or results
- G06F9/3455—Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes of multiple operands or results using stride
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3824—Operand accessing
- G06F9/383—Operand prefetching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3824—Operand accessing
- G06F9/383—Operand prefetching
- G06F9/3832—Value prediction for operands; operand history buffers
Definitions
- This invention relates to the field of electronics, and specifically to a method and system for predicting a next data location in a process to facilitate a prefetch of data from that location.
- Prefetching is a common technique for minimizing latency in a sequential process.
- Data that is expected to be needed at a future time in the process is retrieved from a memory and stored in a cache, for subsequent access at the future time.
- the cache is designed to provide a substantially faster access time than the memory.
- the process needs the data, and the data is in the cache, the data is provided to the process at the higher cache- access speed.
- the data is not in the cache, the data is not provided to the process until after the substantially slower memory- access time, thereby introducing memory-access delays into the process.
- a variety of techniques are commonly available to facilitate the prediction of the data that is going to be needed at a future time.
- One such technique is "stride prediction", wherein the location of the next data item to be needed is based on the sequence of locations of the prior-accessed data items.
- data is often stored as an array or list of data records, such as records of employee's names, addresses, social security number, and so on.
- these records are fixed-length records, such that the beginning of each record is separated by a fixed number of memory locations from its adjacent records in the memory.
- An application that provides a printout or display of all employee names, for example, will sequentially access memory locations that are separated by this fixed number.
- the application accesses a first employee name from the memory at location L, and accesses a second employee name at location L+S, it is likely that the application will access data located at location L+S+S of the memory to obtain the next employee name. If the data at location L+S+S is retrieved from memory and stored in a higher-speed cache memory before this access is requested, the third employee name can be provided to the application more quickly than the first or second employee names.
- FIG. 1 illustrates an example flow diagram of a prior art stride prediction prefetch process.
- the prefetch process is invoked, typically at the same time that an application invokes a request for access to a new data item.
- this prefetch process may be called as part of a global scheme that includes multiple prefetching algorithms, and may be selectively invoked depending upon factors such as the proximity of sequential data accesses, and the like.
- a processor maintains a stride prediction table (SPT) that records information related to executed accesses to the memory, including the interval between sequential accesses, herein termed the stride of the accesses.
- the stride prediction table is typically configured to record multiple sets of information related to executed accesses to keep track of multiple potential strides.
- the invention is presented herein using the paradigm of a single set of information that is used to keep track of a single stride between related memory accesses.
- One of ordinary skill in the art will recognize that the principles presented herein are directly applicable to the conventional use of a stride prediction table that includes multiple sets of information related to multiple strides.
- the cu ⁇ ent stride is determined by the difference between the address of the prior/old access and the address of the new requested access, at 120.
- a prefetch is executed to fetch the data at an address at the same interval from the cu ⁇ ent/new address, at 130.
- the new address replaces the old address, in preparation for the next data access, and the prefetch routine terminates, at 150.
- the prefetched data By initiating the prefetch for the next- likely data at the time of requested access to the new data, the prefetched data will be present in the higher-speed cache when and if the application initiates an access request for this data. If the next-likely data is not the data that is subsequently requested, the cache will not contain the next-requested data, and a memory-access will be required.
- FIG. 2 illustrates an example flow diagram of an improved prior art stride prediction prefetch process, wherein a prefetch is initiated if and only if two successive accesses exhibit the same stride.
- the determined stride for the new access request is compared to the prior determined stride. If the new stride equals the old stride, the likelihood that the next stride will also be equal to these two strides is sufficiently high to wa ⁇ ant a prefetch, at 130. If the new stride differs from the old stride, the new stride replaces the old stride, at 230, in preparation for the next cycle.
- this two-in-a-row criteria for initiating a prefetch may be extended to three-in- a-row, four-in-a-row, and so on, to tradeoff between excessive memory-access traffic and the likelihood of having the next-requested data in cache. It is an object of this invention to improve the likelihood of prefetching data that will subsequently be accessed by an application. It is a further object of this invention to provide an efficient prefetch scheme that is well suited for hardware implementation.
- a prefetching system that includes hysteresis in the determination and modification of a stride value that is used for prefetching data in a sequential process.
- FIG. 1 illustrates an example flow diagram of a prior art stride prediction prefetch process.
- FIG. 2 illustrates an example flow diagram of an alternative prior art stride prediction prefetch process.
- FIG. 3 illustrates an example flow diagram of a stride prediction prefetch process in accordance with this invention.
- FIG. 4 illustrates an example block diagram of a stride prediction prefetch system in accordance with this invention.
- the same reference numerals indicate similar or corresponding features or functions.
- a nested loop structure may include an inner loop that cycles through a list of data, and an outer loop that presets one or more variables that are used in each cycle of the inner loop, or an outer loop that stores a result from each cycle of the inner loop.
- the inner loop that is cycling through the list of data will likely exhibit a constant stride.
- the memory access is to the start of the list, whereas the prior access was to the end of the list.
- the span between the end of the list and the start of the list will not co ⁇ espond to the inner loop's stride. Additionally, a data access by the outer loop at the begim ing or end of the loop will produce a span between accesses that does not co ⁇ espond to the inner loop's stride.
- Other examples of an intermittent break in stride include the processing of data that is organized in a multi-dimension array. Typically, the data is processed for a given range along one dimension, then an index to another dimension is incremented, and the data at this next indexed other dimension is processed for the given range along the first dimension.
- the stride along the range of the first dimension will generally be constant, but the increment of the index to the next dimension will likely result in an access having a span from the prior access that does not match the stride along the first dimension.
- a conventional stride prediction process whenever the stride is 'broken' or interrupted, the process of determining the stride is repeated to determine a new stride. During the time that the new stride is being determined, prefetches do not occur, and the application is delayed by memory-accesses at each re-start of an inner loop, or each incrementing of a higher-level index during the processing of a multi-dimension array.
- the stride value is preserved during intermittent breaks in stride.
- Performing a prefetch of data at the cu ⁇ ent stride is dependent upon the number of equal- value strides within a given number of memory accesses, and adjusting the prefetch value is dependent upon the occu ⁇ ence of multiple non-equal-value strides.
- a prefetch of data may be perfo ⁇ ned whenever two out of three accesses have successive equal strides, and a modification of the stride value may be performed whenever two accesses have successive unequal strides.
- FIG. 3 illustrates an example flow diagram of a stride prediction prefetch process in accordance with this invention.
- a "count" parameter is used to count the number of successive strides of the same value, up to a selected maximum.
- this count parameter is also used to distinguish between intermittent breaks in stride and an actual, persistent, change of stride. It will be evident to one of ordinary skill in the art in view of this disclosure that an independent parameter could be employed to count the number of successive non-equal strides.
- the process of this invention compares the cu ⁇ ent stride with the prior stride, at 220. If the cu ⁇ ent stride equals the prior stride, the count is incremented, at 326; if not, the cu ⁇ ent count is decremented, at 322. In a preferred embodiment, the count is limited to a value from zero to a maximum count. Blocks 324 and 328 clip the incremented or decremented count from blocks 322 and 326 to remain within these limits, respectively.
- the count is compared to an upper limit, UL, and to a lower limit, LL, wherein the lower limit LL is preferably less than the upper limit UL.
- the upper limit UL co ⁇ esponds to the number of equal-stride occu ⁇ ences required to wa ⁇ ant a prefetch from the next stride-predicted access location. If the count equals or exceeds this upper limit UL, a prefetch of data from an address co ⁇ esponding to the current address plus the cu ⁇ ent stride is executed, at 130. For example, if the upper limit UL is a value of two, the prefetch is not executed, initially, unless two-in-a-row of the same stride value occurs.
- the prefetch is not executed until three-in-a-row of the same stride value occurs, and so on. Thereafter, subsequent equal-valued strides continue to increment the count, up to the maximum, via 326, 328.
- the maximum value is selected as the number of successive equal-value strides required to conclude that the cu ⁇ ent stride value is 'reliable'.
- the count is decremented, at 322.
- the lower limit is selected as the value of the cu ⁇ ent count that constitutes a sufficient measure of unreliability to wa ⁇ ant a change to the cu ⁇ ent stride value, at 230.
- a prefetch, at 130 is not performed.
- the lower limit LL is set to be less than the upper limit UL, and generally set to be UL-1.
- the lower limit LL may be selected relative to the upper limit UL such that the assessed reliability of the cu ⁇ ent stride is not sufficient to wa ⁇ ant a prefetch, at 130 (coun UL), while the assessed unreliability is not sufficient to wa ⁇ ant a modification of the cu ⁇ ent stride, at 230 (count>LL).
- FIG. 4 illustrates an example block diagram of a stride prediction prefetch system 400 in accordance with this invention.
- a fetch controller 430 effects prefetching of data from a memory 450 to a cache 460, based on the contents of a control register 410 and the data access requests from a processor 420.
- the address of the requested data is used to determine whether the application is requesting data in a repetitive manner, exhibiting a consistent stride, or span of locations, between memory accesses, as discussed above.
- the control register 410 includes an address of a prior data access 416, the prior stride 412, and a counter 414.
- two counters may be used in lieu of the single counter 414, to maintain an independent count of equal-stride and non-equal stride accesses.
- the address of the cu ⁇ ently requested data from the processor 420 is compared to prior address 416 to determine a cu ⁇ ent stride. If the cu ⁇ ent stride co ⁇ esponds to the prior stride 412, the counter 414 is incremented; otherwise, it is decremented. Depending upon the incremented or decremented value of the counter 414, the fetch controller determines whether to initiate a prefetch of data from a next-predicted location in the memory 450 to the cache 460. When a subsequent data request is for data that has been prefetched into the cache 460, the cache 460 provides the data directly to the processor, thereby avoiding the delay introduced by retrieving the data from the memory 450.
- the fetch controller 430 determines whether to modify the stride value 412, as detailed above. By determining whether to modify the stride value 412 based on a count that depends upon the number of non-equal strides, rather than modifying the stride value based on the occu ⁇ ence of one non-equal stride as in a conventional system, the stride prediction prefetch system 400 is de-sensitized to intermittent breaks in stride.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
- Advance Control (AREA)
Abstract
A prefetching system (400) includes hysteresis in the determination and modification of a stride value (412) that is used for prefetching data (130) in a sequential process. Once a stride value is determined, intermittent stride inconsistencies are ignored (322-330), and the stride value retains its prior value. When the stride inconsistencies become frequent (322-330), the stride value is modified (230). When the modified stride value becomes repetitive, the system adopts this value as the stride, and subsequent stride inconsistencies are again ignored, and the stride value thereafter retains is current value until inconsistencies become frequent.
Description
COUNTER BASED STRIDE PREDICTION FOR DATA PREFETCH
This invention relates to the field of electronics, and specifically to a method and system for predicting a next data location in a process to facilitate a prefetch of data from that location.
Prefetching is a common technique for minimizing latency in a sequential process. Data that is expected to be needed at a future time in the process is retrieved from a memory and stored in a cache, for subsequent access at the future time. The cache is designed to provide a substantially faster access time than the memory. Thus, when the process needs the data, and the data is in the cache, the data is provided to the process at the higher cache- access speed. Conversely, if the data is not in the cache, the data is not provided to the process until after the substantially slower memory- access time, thereby introducing memory-access delays into the process.
A variety of techniques are commonly available to facilitate the prediction of the data that is going to be needed at a future time. One such technique is "stride prediction", wherein the location of the next data item to be needed is based on the sequence of locations of the prior-accessed data items. For example, data is often stored as an array or list of data records, such as records of employee's names, addresses, social security number, and so on. Typically, these records are fixed-length records, such that the beginning of each record is separated by a fixed number of memory locations from its adjacent records in the memory. An application that provides a printout or display of all employee names, for example, will sequentially access memory locations that are separated by this fixed number. Given that' the application accesses a first employee name from the memory at location L, and accesses a second employee name at location L+S, it is likely that the application will access data located at location L+S+S of the memory to obtain the next employee name. If the data at location L+S+S is retrieved from memory and stored in a higher-speed cache memory before this access is requested, the third employee name can be provided to the application more quickly than the first or second employee names.
FIG. 1 illustrates an example flow diagram of a prior art stride prediction prefetch process. At 110, the prefetch process is invoked, typically at the same time that an application invokes a request for access to a new data item. Not illustrated, this prefetch process may be called as part of a global scheme that includes multiple prefetching algorithms, and may be selectively invoked depending upon factors such as the proximity of sequential data accesses, and the like. Typically, a processor maintains a stride prediction
table (SPT) that records information related to executed accesses to the memory, including the interval between sequential accesses, herein termed the stride of the accesses. The stride prediction table is typically configured to record multiple sets of information related to executed accesses to keep track of multiple potential strides. For ease of convenience and understanding, the invention is presented herein using the paradigm of a single set of information that is used to keep track of a single stride between related memory accesses. One of ordinary skill in the art will recognize that the principles presented herein are directly applicable to the conventional use of a stride prediction table that includes multiple sets of information related to multiple strides. At 110, the cuπent stride is determined by the difference between the address of the prior/old access and the address of the new requested access, at 120. On the assumption that a next requested access will be equally spaced as the prior access, a prefetch is executed to fetch the data at an address at the same interval from the cuπent/new address, at 130. At 140, the new address replaces the old address, in preparation for the next data access, and the prefetch routine terminates, at 150. By initiating the prefetch for the next- likely data at the time of requested access to the new data, the prefetched data will be present in the higher-speed cache when and if the application initiates an access request for this data. If the next-likely data is not the data that is subsequently requested, the cache will not contain the next-requested data, and a memory-access will be required. The prefetch process of FIG. 1, however, initiates a prefetch on every access to new data, without regard to whether there is any basis for the assumption that the next-likely requested data will be equally spaced from the prior-requested data. This causes substantial memory access traffic, which can serve to substantially reduce the effective memory access. FIG. 2 illustrates an example flow diagram of an improved prior art stride prediction prefetch process, wherein a prefetch is initiated if and only if two successive accesses exhibit the same stride. In this embodiment, at 220, the determined stride for the new access request is compared to the prior determined stride. If the new stride equals the old stride, the likelihood that the next stride will also be equal to these two strides is sufficiently high to waπant a prefetch, at 130. If the new stride differs from the old stride, the new stride replaces the old stride, at 230, in preparation for the next cycle. One of ordinary skill in the art will recognize that this two-in-a-row criteria for initiating a prefetch may be extended to three-in- a-row, four-in-a-row, and so on, to tradeoff between excessive memory-access traffic and the likelihood of having the next-requested data in cache.
It is an object of this invention to improve the likelihood of prefetching data that will subsequently be accessed by an application. It is a further object of this invention to provide an efficient prefetch scheme that is well suited for hardware implementation.
These objects and others are achieved by a prefetching system that includes hysteresis in the determination and modification of a stride value that is used for prefetching data in a sequential process. Once a stride value is determined, intermittent stride inconsistencies are ignored, and the stride value retains its prior value. When the stride inconsistencies become frequent, the stride value is modified. When the modified stride value becomes repetitive, the system adopts this value as the stride, and subsequent stride inconsistencies are again ignored, and the stride value thereafter retains is current value until inconsistencies become frequent.
FIG. 1 illustrates an example flow diagram of a prior art stride prediction prefetch process.
FIG. 2 illustrates an example flow diagram of an alternative prior art stride prediction prefetch process.
FIG. 3 illustrates an example flow diagram of a stride prediction prefetch process in accordance with this invention.
FIG. 4 illustrates an example block diagram of a stride prediction prefetch system in accordance with this invention. Throughout the drawings, the same reference numerals indicate similar or corresponding features or functions.
This invention is premised on the observation that regular stride patterns in memory accesses often include intermittent data accesses that do not confonn to the stride pattern. For example, a nested loop structure may include an inner loop that cycles through a list of data, and an outer loop that presets one or more variables that are used in each cycle of the inner loop, or an outer loop that stores a result from each cycle of the inner loop. In this example, the inner loop that is cycling through the list of data will likely exhibit a constant stride. At each re-commencement of the inner loop, however, the memory access is to the start of the list, whereas the prior access was to the end of the list. The span between the end of the list and the start of the list, however, will not coπespond to the inner loop's stride. Additionally, a data access by the outer loop at the begim ing or end of the loop will produce a span between accesses that does not coπespond to the inner loop's stride. Other examples of an intermittent break in stride include the processing of data that is organized
in a multi-dimension array. Typically, the data is processed for a given range along one dimension, then an index to another dimension is incremented, and the data at this next indexed other dimension is processed for the given range along the first dimension. The stride along the range of the first dimension will generally be constant, but the increment of the index to the next dimension will likely result in an access having a span from the prior access that does not match the stride along the first dimension.
In a conventional stride prediction process, whenever the stride is 'broken' or interrupted, the process of determining the stride is repeated to determine a new stride. During the time that the new stride is being determined, prefetches do not occur, and the application is delayed by memory-accesses at each re-start of an inner loop, or each incrementing of a higher-level index during the processing of a multi-dimension array.
In accordance with this invention, the stride value is preserved during intermittent breaks in stride. Performing a prefetch of data at the cuπent stride is dependent upon the number of equal- value strides within a given number of memory accesses, and adjusting the prefetch value is dependent upon the occuπence of multiple non-equal-value strides. In a simple example, a prefetch of data may be perfoπned whenever two out of three accesses have successive equal strides, and a modification of the stride value may be performed whenever two accesses have successive unequal strides.
FIG. 3 illustrates an example flow diagram of a stride prediction prefetch process in accordance with this invention. In this example, a "count" parameter is used to count the number of successive strides of the same value, up to a selected maximum. In a prefeπed embodiment of this invention, this count parameter is also used to distinguish between intermittent breaks in stride and an actual, persistent, change of stride. It will be evident to one of ordinary skill in the art in view of this disclosure that an independent parameter could be employed to count the number of successive non-equal strides.
After determining the current stride, between the new access address and the prior access address, at 120, the process of this invention compares the cuπent stride with the prior stride, at 220. If the cuπent stride equals the prior stride, the count is incremented, at 326; if not, the cuπent count is decremented, at 322. In a preferred embodiment, the count is limited to a value from zero to a maximum count. Blocks 324 and 328 clip the incremented or decremented count from blocks 322 and 326 to remain within these limits, respectively.
At 330, the count is compared to an upper limit, UL, and to a lower limit, LL, wherein the lower limit LL is preferably less than the upper limit UL. In this example, the
upper limit UL coπesponds to the number of equal-stride occuπences required to waπant a prefetch from the next stride-predicted access location. If the count equals or exceeds this upper limit UL, a prefetch of data from an address coπesponding to the current address plus the cuπent stride is executed, at 130. For example, if the upper limit UL is a value of two, the prefetch is not executed, initially, unless two-in-a-row of the same stride value occurs. If the upper limit is a value of three, the prefetch is not executed until three-in-a-row of the same stride value occurs, and so on. Thereafter, subsequent equal-valued strides continue to increment the count, up to the maximum, via 326, 328. In a prefeπed embodiment, the maximum value is selected as the number of successive equal-value strides required to conclude that the cuπent stride value is 'reliable'.
Each time that a non-equal stride occurs, the count is decremented, at 322. Thus, the difference between the maximum count and the cuπent count coπesponds to a measure of 'unreliability' of the cuπent stride value. The lower limit is selected as the value of the cuπent count that constitutes a sufficient measure of unreliability to waπant a change to the cuπent stride value, at 230. Generally, as indicated in FIG. 3, if the current stride value is deemed unreliable, at 230, a prefetch, at 130, is not performed. In like manner, if the cuπent stride value is sufficiently reliable to waπant a prefetch, at 130, a modification of the cuπent stride value at 230 is not performed. Thus, the lower limit LL is set to be less than the upper limit UL, and generally set to be UL-1. Optionally, as indicated by the dashed connection between the test block 330 and the block 140, the lower limit LL may be selected relative to the upper limit UL such that the assessed reliability of the cuπent stride is not sufficient to waπant a prefetch, at 130 (coun UL), while the assessed unreliability is not sufficient to waπant a modification of the cuπent stride, at 230 (count>LL).
FIG. 4 illustrates an example block diagram of a stride prediction prefetch system 400 in accordance with this invention. A fetch controller 430 effects prefetching of data from a memory 450 to a cache 460, based on the contents of a control register 410 and the data access requests from a processor 420. The address of the requested data is used to determine whether the application is requesting data in a repetitive manner, exhibiting a consistent stride, or span of locations, between memory accesses, as discussed above. The control register 410 includes an address of a prior data access 416, the prior stride 412, and a counter 414. As noted above, two counters may be used in lieu of the single counter 414, to maintain an independent count of equal-stride and non-equal stride accesses. The address of the cuπently requested data from the processor 420 is compared to prior address 416 to
determine a cuπent stride. If the cuπent stride coπesponds to the prior stride 412, the counter 414 is incremented; otherwise, it is decremented. Depending upon the incremented or decremented value of the counter 414, the fetch controller determines whether to initiate a prefetch of data from a next-predicted location in the memory 450 to the cache 460. When a subsequent data request is for data that has been prefetched into the cache 460, the cache 460 provides the data directly to the processor, thereby avoiding the delay introduced by retrieving the data from the memory 450.
Also dependent upon the incremented or decremented value of the counter 414, the fetch controller 430 determines whether to modify the stride value 412, as detailed above. By determining whether to modify the stride value 412 based on a count that depends upon the number of non-equal strides, rather than modifying the stride value based on the occuπence of one non-equal stride as in a conventional system, the stride prediction prefetch system 400 is de-sensitized to intermittent breaks in stride.
The foregoing merely illustrates the principles of the invention. It will thus be appreciated that those skilled in the art will be able to devise various aπangements which, although not explicitly described or shown herein, embody the principles of the invention and are thus within the spirit and scope of the following claims.
Claims
1. A method of prefetching data from a memory (450) to a cache (460), comprising: determining (326) a first measure of equal-stride memory accesses, based on a plurality of equal-stride memory accesses and a prior stride value (412), determining (322) a second measure of non-equal-stride memory accesses, based on a plurality of non-equal- stride memory accesses and the prior stride value (412), effecting (130) a prefetch of the data from the memory (450) based on the first measure, and effecting (230) a modification of the prior stride value (412) based on the second measure.
2. The method of claim 1 , wherein determining the first measure and the second measure is effected by maintaining a count (414) that is increrhented (326) for each equal-stride memory access and decremented (322) for each non-equal-stride memory access, and effecting the prefetch and effecting the modification are each based (330) on the count (414).
3. The method of claim 2, wherein effecting (130) the prefetch occurs when (330) the count (414) is equal or above an upper limit, and effecting (230) the modification occurs when (330) the count (414) is equal or below a lower limit.
4. The method of claim 3, wherein the count (414) is limited to a maximum of three, the upper limit is two, and the lower limit is one.
5. A prefetch system (400) comprising: a control register (410) that is configured to contain at least one measure (414) that coπesponds to a consistency of stride values between requested memory accesses, a prefetch controller (430) that is configured to prefetch data from a memory (450) to a cache (460), based on the measure of consistency, wherein the consistency of stride values is dependent upon a comparison of a cuπent stride with a prior stride value (412), the prefetch controller (430) is further configured to modify the prior stride value (412) based on a measure of inconsistency, and the measure of inconsistency is based on a plurality of non-equal-strides between requested memory accesses.
6. The prefetch system (400) of claim 5, wherein the measure of consistency and the measure of inconsistency coπespond to a count (414) that is incremented upon each equal-stride requested memory access, up to a maximum count, and decremented upon each non-equal-stride requested memory access, down to a minimum count.
7. The prefetch system (400) of claim 6, wherein the prefetch controller (430) is configured to: prefetch the data when the count (414) is equal or above an upper threshold level, and modify the prior stride value (412) when the count (414) is equal or below a lower threshold level.
8. The prefetch system (400) of claim 7, wherein the maximum count is three, the upper threshold level is two, the lower threshold level is one, and the minimum count is zero.
9. A processing system, comprising: a memory (450) that is configured to provide access to data based on an access address, a cache (460), operably coupled to the memory (450), that is configured to store data that is accessed from the memory (450), to facilitate rapid access to the data, a processor (420), operably coupled to the memory (450) and the cache (460), that is configured to provide the access address and to receive the data from the cache (460), if it is stored in the cache (460), or from the memory (450), if it is not stored in the cache (460), and a fetch controller (430), operably coupled to the processor (420), the memory (450), and the cache (460), that is configured to effect a transfer of data from the memory (450) to the cache (460), based on the access address and a predicted stride value, wherein the fetch controller (430) is further configured to maintain a measure of stride consistency that is based on repeat occuπences of equal stride values, and a measure of stride inconsistency that is based on repeat occuπences of unequal stride values, and he fetch controller (430) effects the transfer of data based on the measure of stride consistency, and effects a modification of the predicted stride value based on the measure of stride inconsistency.
10. The processing system of claim 9, wherein the measure of stride consistency and the measure of stride inconsistency are each based on a count (414) that is incremented upon each equal-stride requested memory access, up to a maximum count, and decremented upon each non-equal-stride requested memory access, to a minimum count.
11. The processing system of claim 10, wherein the fetch controller (430) is configured to: effect the transfer of data when the count (414) is equal or above an upper threshold level, and effect the modification of the predicted stride value when the count (414) is equal or below a lower threshold level.
12. The processing system of claim 11, wherein the maximum count is three, the upper threshold level is two, the lower threshold level is one, and the minimum count is zero.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US43275502P | 2002-12-12 | 2002-12-12 | |
US432755P | 2002-12-12 | ||
PCT/IB2003/005796 WO2004053686A1 (en) | 2002-12-12 | 2003-12-09 | Counter based stride prediction for data prefetch |
Publications (1)
Publication Number | Publication Date |
---|---|
EP1576465A1 true EP1576465A1 (en) | 2005-09-21 |
Family
ID=32507990
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP03778603A Withdrawn EP1576465A1 (en) | 2002-12-12 | 2003-12-09 | Counter based stride prediction for data prefetch |
Country Status (6)
Country | Link |
---|---|
EP (1) | EP1576465A1 (en) |
JP (1) | JP2006510082A (en) |
KR (1) | KR20050084232A (en) |
CN (1) | CN1726459A (en) |
AU (1) | AU2003285604A1 (en) |
WO (1) | WO2004053686A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9274965B2 (en) | 2008-12-15 | 2016-03-01 | International Business Machines Corporation | Prefetching data |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4212521B2 (en) * | 2004-06-30 | 2009-01-21 | 株式会社東芝 | Prefetch control device, prefetch control method and program for controlling prefetch of data to temporary storage device of processor |
JP2009230374A (en) | 2008-03-21 | 2009-10-08 | Fujitsu Ltd | Information processor, program, and instruction sequence generation method |
WO2011060570A1 (en) * | 2009-11-17 | 2011-05-26 | 华为技术有限公司 | High-speed counter processing method and counter |
US8433852B2 (en) * | 2010-08-30 | 2013-04-30 | Intel Corporation | Method and apparatus for fuzzy stride prefetch |
US9253282B2 (en) | 2011-10-18 | 2016-02-02 | Qualcomm Incorporated | Method and apparatus for generating, using, or updating an enriched user profile |
CN102385622B (en) * | 2011-10-25 | 2013-03-13 | 曙光信息产业(北京)有限公司 | Pre-reading method for stride access mode of file system |
US20140189249A1 (en) * | 2012-12-28 | 2014-07-03 | Futurewei Technologies, Inc. | Software and Hardware Coordinated Prefetch |
US10379864B2 (en) | 2016-12-26 | 2019-08-13 | Intel Corporation | Processor prefetch throttling based on short streams |
US11194728B2 (en) | 2019-07-29 | 2021-12-07 | Micron Technology, Inc. | Memory-aware pre-fetching and cache bypassing systems and methods |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU7666796A (en) * | 1996-11-04 | 1998-05-29 | Advanced Micro Devices Inc. | A stride-based data address prediction structure |
DE10121792C2 (en) * | 2000-05-26 | 2003-09-25 | Ibm | Universal loading address / value prediction scheme |
-
2003
- 2003-12-09 JP JP2004558276A patent/JP2006510082A/en not_active Abandoned
- 2003-12-09 EP EP03778603A patent/EP1576465A1/en not_active Withdrawn
- 2003-12-09 KR KR1020057010495A patent/KR20050084232A/en not_active Application Discontinuation
- 2003-12-09 WO PCT/IB2003/005796 patent/WO2004053686A1/en not_active Application Discontinuation
- 2003-12-09 AU AU2003285604A patent/AU2003285604A1/en not_active Abandoned
- 2003-12-09 CN CNA2003801057914A patent/CN1726459A/en active Pending
Non-Patent Citations (1)
Title |
---|
See references of WO2004053686A1 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9274965B2 (en) | 2008-12-15 | 2016-03-01 | International Business Machines Corporation | Prefetching data |
US10810125B2 (en) | 2008-12-15 | 2020-10-20 | International Business Machines Corporation | Prefetching data |
Also Published As
Publication number | Publication date |
---|---|
JP2006510082A (en) | 2006-03-23 |
AU2003285604A1 (en) | 2004-06-30 |
WO2004053686A1 (en) | 2004-06-24 |
CN1726459A (en) | 2006-01-25 |
KR20050084232A (en) | 2005-08-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108459877B (en) | Data processing | |
US6484239B1 (en) | Prefetch queue | |
US7500063B2 (en) | Method and apparatus for managing a cache memory in a mass-storage system | |
US6584549B2 (en) | System and method for prefetching data into a cache based on miss distance | |
US6782454B1 (en) | System and method for pre-fetching for pointer linked data structures | |
JP4060506B2 (en) | Disk controller | |
EP1576465A1 (en) | Counter based stride prediction for data prefetch | |
JPH0628180A (en) | Prefetch buffer | |
US8356143B1 (en) | Prefetch mechanism for bus master memory access | |
TWI259362B (en) | Memory controller | |
JP4875981B2 (en) | Prefetch control in data processing system | |
US20090217004A1 (en) | Cache with prefetch | |
US7424562B2 (en) | Intelligent PCI bridging consisting of prefetching data based upon descriptor data | |
WO2001063240A2 (en) | Maintaining high snoop traffic throughput and preventing cache data eviction during an atomic operation | |
US20080104327A1 (en) | Systems and Method for Improved Data Retrieval from Memory on Behalf of Bus Masters | |
JP4250989B2 (en) | Memory access control device | |
US20060031633A1 (en) | System method and circuit for retrieving data into a cache memory from a mass data storage device | |
US6286082B1 (en) | Apparatus and method to prevent overwriting of modified cache entries prior to write back | |
US11086781B2 (en) | Methods and apparatus for monitoring prefetcher accuracy information using a prefetch flag independently accessible from prefetch tag information | |
US6397304B1 (en) | Method and apparatus for improving system performance in multiprocessor systems | |
JP2003044357A (en) | Cache prefetch system | |
JPH10254775A (en) | Memory controller having common cache memory | |
JPH06342401A (en) | Secondary memory controller | |
JPH07219845A (en) | Cache memory control system | |
JP2001154845A (en) | Memory bus access control system after cache miss |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20050712 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LI LU MC NL PT RO SE SI SK TR |
|
AX | Request for extension of the european patent |
Extension state: AL LT LV MK |
|
DAX | Request for extension of the european patent (deleted) | ||
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20060307 |