[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

US20140351518A1 - Multi-level cache tracking table - Google Patents

Multi-level cache tracking table Download PDF

Info

Publication number
US20140351518A1
US20140351518A1 US13/902,366 US201313902366A US2014351518A1 US 20140351518 A1 US20140351518 A1 US 20140351518A1 US 201313902366 A US201313902366 A US 201313902366A US 2014351518 A1 US2014351518 A1 US 2014351518A1
Authority
US
United States
Prior art keywords
data
unit
tracking table
processor
hierarchy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US13/902,366
Other versions
US9514044B2 (en
Inventor
Jichuan Chang
Doe Hyun Yoon
Parthasarathy Ranganathan
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hewlett Packard Enterprise Development LP
Original Assignee
Hewlett Packard Development Co LP
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett Packard Development Co LP filed Critical Hewlett Packard Development Co LP
Priority to US13/902,366 priority Critical patent/US9514044B2/en
Assigned to HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P. reassignment HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: CHANG, JICHUAN, RANGANATHAN, PARTHASARATHY, YOON, DOE HYUN
Publication of US20140351518A1 publication Critical patent/US20140351518A1/en
Assigned to HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP reassignment HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P.
Application granted granted Critical
Publication of US9514044B2 publication Critical patent/US9514044B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0811Multiuser, multiprocessor or multiprocessing cache systems with multilevel cache hierarchies
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0893Caches characterised by their organisation or structure
    • G06F12/0897Caches characterised by their organisation or structure with two or more cache hierarchy levels
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0888Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using selective caching, e.g. bypass

Definitions

  • Processors heretofore may be accompanied with cache memory to reduce the average data retrieval time of the processor.
  • Processor cache may have a multi-level hierarchy such that data may be moved to deeper levels of the hierarchy as the data is used less frequently. The data may be evicted from the hierarchy altogether, if the data remains idle in the hierarchy for a certain amount of time.
  • FIG. 1 is a block diagram of an example cache hierarchy in accordance with aspects of the present disclosure.
  • FIG. 2 is a flow diagram of an example method in accordance with aspects of the present disclosure.
  • FIG. 3 is a block diagram of an example tracking table schema in accordance with aspects of the present disclosure.
  • FIG. 4 is a block diagram of a cache coherent multi-core processor in accordance with aspects of the present disclosure.
  • processors may be equipped with a multi-level cache hierarchy. Data may be cached in a first level of the hierarchy and sequentially evicted to deeper levels as the data is used less frequently. Thus, data may be cached in a level of the hierarchy that coincides with its usage. If a unit of data remains idle for a certain amount of time, the unit of data may be completely evicted from the hierarchy and may return back to the main memory.
  • the processor may search the hierarchy sequentially from the first level until the deepest level is reached. Unfortunately, these sequential searches may hinder a processor's performance. The affect on performance may be particularly noticeable in processors with very deep cache hierarchies.
  • the processor may simply return the data back to the first level, where it may be used sporadically until it is returned to deeper levels. However, higher cache levels should be reserved for data that is used more frequently.
  • a tracking table may be used to initiate a search for data from a location specified in the tracking table, if the data is not in a first level of the multi-level cache hierarchy.
  • the techniques disclosed herein permit the processor to begin its search from a more precise level.
  • the tracking table location history may be used to evict a unit of data to a level that coincides with its usage. Such evictions may reserve higher levels for more frequently used data.
  • FIG. 1 presents a schematic diagram of an illustrative system 100 for executing the techniques disclosed herein.
  • Processor 110 may be any number of well known processors, such as processors from Intel® Corporation.
  • processor 110 may be an application specific integrated circuit (“ASIC”).
  • Main memory 120 may comprise any data storage device accessible by the processor, such as a random access memory (“RAM”) device or a dynamic random access memory (“DRAM”) device.
  • main memory 120 may comprise a hard-drive, ROM, CD-ROM, flash memories, write-capable or read-only memories.
  • main memory 120 may be divided into multiple memory segments organized as dual in-line memory modules (“DIMMs”).
  • DIMMs dual in-line memory modules
  • FIG. 1 also shows an example multi-level cache hierarchy comprising a first cache level 112 , a second cache level 114 , a third cache level 116 , and a fourth cache level 118 .
  • These levels may be arranged in a variety of ways; for example, each level may be positioned inside or outside processor 110 .
  • first cache level 112 may be positioned inside processor 110 while the other cache levels remain outside the processor.
  • first level cache 112 may be used exclusively by processor 110 and the other cache levels may be shared with other processors.
  • FIG. 1 is for illustrative purposes and that a variety of configurations may be employed.
  • Tracking table 124 may comprise circuitry to maintain location information and location history for each unit of data that moves between main memory 120 and levels of the cache hierarchy. The circuitry of tracking table 124 may update the tracking history associated with a unit of data, when the data moves to a different level of the hierarchy or when the unit is evicted from the hierarchy.
  • processor 110 may begin a search for a unit of data from a location specified in tracking table 124 , if the data is not in first level cache 112 (i.e., if there is a first level cache miss). That is, rather than searching the hierarchy sequentially, processor 110 may initiate a search for a given unit of data from a physical location specified in tracking table 124 .
  • the history in the tracking table may include one or multiple previous locations in which a unit of data resided.
  • tracking table 124 may notify processor 110 of the data's location such that the processor goes directly to fourth level cache 118 rather than searching sequentially from the first level. Tracking table 124 may not always have the exact location of the data, but may have a more precise location. For example, tracking table 124 may notify processor 110 that the unit of data resides in third level cache 113 . In this instance, processor 110 may begin its search in third level cache 113 and then go to fourth level cache 118 . Tracking table 124 may be in near proximity to first level cache 112 . Being physically near the first level may further enhance the retrieval of data, since the processor may always search for data in the first level before resorting to the tracking table.
  • tracking table 124 may be used to determine the appropriate level for cached data.
  • processor 110 may move the unit of data to a different location that is determined based at least partially on the history of movements in tracking table 124 .
  • the new location may be determined in a variety of ways.
  • the processor may evict the data to its prior location.
  • the history in the tracking table may be used as a predictor of the data's future usage.
  • a conventional processor is done using data obtained from a deep cache level, it may simply place it back in the first level, even though the data's future usage may be infrequent.
  • At least one victim tracking table may be used to store data evicted from tracking table 124 .
  • History associated with a unit of data may be evicted from tracking table 124 when the history is referenced less frequently by processor 110 .
  • FIG. 1 shows victim tracking table 126 and victim tracking table 128 .
  • each unit of data may be associated with a data block in tracking table 124 .
  • processor 110 may evict a unit of data from the cache hierarchy, when the data block associated with the unit is evicted from tracking table 124 and victim tracking tables 126 and 128 .
  • the tracking table and its associated victim tracking tables may be arranged in a variety of ways. For example, tracking table 124 may be arranged inside processor 110 and victim tracking tables 126 and 128 may be placed outside processor 110 .
  • FIG. 1 also shows an example translation lookaside buffer (“TLB”) 122 that may be used by the processor to translate virtual addresses into physical addresses.
  • TLB 122 may be implemented as content-addressable memory (“CAM”) such that the CAM search key is a virtual address received by a program (e.g., an operating system) and the output is a physical address of the unit of data in main memory 120 .
  • CAM content-addressable memory
  • the physical address returned by TLB 122 may still refer to an address in main memory 120 .
  • this physical address may be used to find the location and location history of the unit of data in tracking table 124 .
  • FIG. 2 illustrates a flow diagram that summarizes an example method 200 for enhancing the retrieval of cached objects.
  • it is determined whether data is cached in a first level of a multi-level cache hierarchy.
  • processor 110 may search for a unit of data in first level cache 112 .
  • a search for data may be initiated from a location specified in the history, as shown in block 204 .
  • processor 110 may search tracking table 124 to obtain a more precise location of the sought after data.
  • tracking table 124 is shown having rows of data broken up into individual data blocks.
  • the location and location history associated with each unit of data may be stored in a data block of tracking table 124 that corresponds to each unit of data.
  • each data block in tracking table 124 associated with a given unit of data may have a similar size as that of each data block in TLB 122 associated with the same unit of data.
  • An insert of a data block in TLB 122 for a unit of data may cause an insert of a corresponding data block in tracking table 124 . This may ensure that each unit of data that the processor may potentially use is accounted for in the tracking table.
  • the similarity in data block size between tracking table 124 and TLB 122 may facilitate the synchronicity therebetween.
  • each row of data in tracking table 124 may be 4 kilobytes and each data block may be 64 bytes.
  • each row in tracking table 124 may comprise multiple data blocks.
  • processor 110 may utilize TLB 122 to translate the virtual address to a physical address in main memory 120 . If the unit of data is cached, processor 110 may initially search for the unit of data in first level cache 112 . If the search results in a cache miss, processor 110 may search tracking table 124 using the physical address returned from TLB 122 .
  • physical address 301 shown in FIG. 3 may be received by tracking table 124 from processor 110 . Tracking table 124 may parse physical address 301 to locate the data block that corresponds to the unit of data sought by the processor.
  • tracking table 124 may parse tag field 302 , row index field 304 , and column index field 306 .
  • Tracking table 124 may concatenate tag field 302 and row index field 304 to locate the row containing the data block associated with the sought after unit of data.
  • the unit of data in question is located in row 310 .
  • the tracking table may then use column index field 306 to locate the block of data in row 310 .
  • the sought after data block is data block 312 .
  • Tracking table 124 may find the location of the sought after unit of data in data block 312 and return it to processor 110 .
  • tracking table 124 may return the data's last known location (e.g., in the cache hierarchy or in main memory). Processor 110 may then use the data block offset field 308 to locate the data within a cache line of the cache level, if the location provided by the tracking table is a level in the hierarchy.
  • each processor 402 and 410 has its own private tracking table 408 and 414 respectively and its own private first level cache 404 and 412 respectively.
  • FIG. 4 further shows a second cache level 418 , a third cache level 420 , and a fourth cache level 422 that are shared between processor 402 and processor 410 .
  • the cache hierarchy shown in FIG. 4 may be used by processor 402 and 410 to store units of data cached from main memory 424 .
  • Processors 402 and 410 are also shown having their own region filters 406 and 416 respectively. Each region filter may monitor the cache coherence status of a unit of data and filter out unnecessary cache coherence transactions.
  • cache coherence may be defined as the consistency of data stored in a shared resource, such as the shared cache levels shown in FIG. 4 .
  • the region filters may also use the tracking tables to facilitate the location of data.
  • a region filter may analyze a cache coherence transaction history associated with a unit of data to determine whether the data is likely in a private cache memory of a different processor.
  • region filter 406 may use a cache coherence history to determine whether a sought after unit of data is likely located in first level cache memory 412 , which is exclusive to processor 410 . If the region filter determines that the data is not likely in the private cache memory of the different processor, the region filter may locate the unit of data based on the coherence transaction history and the location information associated with the unit of data in the tracking table.
  • region filter 406 may use cache coherence transaction history and location information in tracking table 408 associated with a unit of data to determine where the unit is located in the shared cache memory levels.
  • the foregoing computer system, integrated circuit, and method enhance the retrieval of data stored in a cache hierarchy.
  • the processor may be provided with a more accurate location of the data.
  • processor manufacturers may implement deeper cache hierarchies without being concerned about a reduction in processor performance.
  • the techniques disclosed herein may also enhance the retrieval of cached data in multiprocessor systems.
  • the tracking table may be used for improved cache level eviction such that the data is evicted to a level that better corresponds with an expected usage of the data.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

Disclosed herein are a computing system, integrated circuit, and method to enhance retrieval of cached data. A tracking table is used to initiate a search for data from a location specified in the table, if the data is not in a first level of a multi-level cache hierarchy.

Description

    BACKGROUND
  • Processors heretofore may be accompanied with cache memory to reduce the average data retrieval time of the processor. Processor cache may have a multi-level hierarchy such that data may be moved to deeper levels of the hierarchy as the data is used less frequently. The data may be evicted from the hierarchy altogether, if the data remains idle in the hierarchy for a certain amount of time.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a block diagram of an example cache hierarchy in accordance with aspects of the present disclosure.
  • FIG. 2 is a flow diagram of an example method in accordance with aspects of the present disclosure.
  • FIG. 3 is a block diagram of an example tracking table schema in accordance with aspects of the present disclosure.
  • FIG. 4 is a block diagram of a cache coherent multi-core processor in accordance with aspects of the present disclosure.
  • DETAILED DESCRIPTION
  • As noted above, processors may be equipped with a multi-level cache hierarchy. Data may be cached in a first level of the hierarchy and sequentially evicted to deeper levels as the data is used less frequently. Thus, data may be cached in a level of the hierarchy that coincides with its usage. If a unit of data remains idle for a certain amount of time, the unit of data may be completely evicted from the hierarchy and may return back to the main memory. However, when a processor requires a unit of data residing in the deepest level, the processor may search the hierarchy sequentially from the first level until the deepest level is reached. Unfortunately, these sequential searches may hinder a processor's performance. The affect on performance may be particularly noticeable in processors with very deep cache hierarchies. Furthermore, after using data retrieved from a deep cache level, the processor may simply return the data back to the first level, where it may be used sporadically until it is returned to deeper levels. However, higher cache levels should be reserved for data that is used more frequently.
  • In view of the foregoing, disclosed herein are a computing system, integrated circuit, and method for enhancing the retrieval of data cached in a multi-level cache hierarchy. In one example, a tracking table may be used to initiate a search for data from a location specified in the tracking table, if the data is not in a first level of the multi-level cache hierarchy. Thus, rather than searching sequentially from the first level in the hierarchy, the techniques disclosed herein permit the processor to begin its search from a more precise level. Furthermore, the tracking table location history may be used to evict a unit of data to a level that coincides with its usage. Such evictions may reserve higher levels for more frequently used data. Moreover, the techniques disclosed herein may also enhance the location of data in shared cached hierarchies during cache coherence transactions. The aspects, features and other advantages of the present disclosure will be appreciated when considered with reference to the following description of examples and accompanying figures. The following description does not limit the application; rather, the scope of the disclosure is defined by the appended claims and equivalents.
  • FIG. 1 presents a schematic diagram of an illustrative system 100 for executing the techniques disclosed herein. Processor 110 may be any number of well known processors, such as processors from Intel® Corporation. In another example, processor 110 may be an application specific integrated circuit (“ASIC”). Main memory 120 may comprise any data storage device accessible by the processor, such as a random access memory (“RAM”) device or a dynamic random access memory (“DRAM”) device. Alternatively, main memory 120 may comprise a hard-drive, ROM, CD-ROM, flash memories, write-capable or read-only memories. In a further example, main memory 120 may be divided into multiple memory segments organized as dual in-line memory modules (“DIMMs”).
  • FIG. 1 also shows an example multi-level cache hierarchy comprising a first cache level 112, a second cache level 114, a third cache level 116, and a fourth cache level 118. These levels may be arranged in a variety of ways; for example, each level may be positioned inside or outside processor 110. Alternatively, first cache level 112 may be positioned inside processor 110 while the other cache levels remain outside the processor. As will be discussed further below, first level cache 112 may be used exclusively by processor 110 and the other cache levels may be shared with other processors. Thus, it is understood that FIG. 1 is for illustrative purposes and that a variety of configurations may be employed.
  • Also shown in FIG. 1, are example tracking tables. Tracking table 124 may comprise circuitry to maintain location information and location history for each unit of data that moves between main memory 120 and levels of the cache hierarchy. The circuitry of tracking table 124 may update the tracking history associated with a unit of data, when the data moves to a different level of the hierarchy or when the unit is evicted from the hierarchy. In a further example, processor 110 may begin a search for a unit of data from a location specified in tracking table 124, if the data is not in first level cache 112 (i.e., if there is a first level cache miss). That is, rather than searching the hierarchy sequentially, processor 110 may initiate a search for a given unit of data from a physical location specified in tracking table 124. The history in the tracking table may include one or multiple previous locations in which a unit of data resided.
  • By way of example, if a unit of data resides in fourth level cache 118, tracking table 124 may notify processor 110 of the data's location such that the processor goes directly to fourth level cache 118 rather than searching sequentially from the first level. Tracking table 124 may not always have the exact location of the data, but may have a more precise location. For example, tracking table 124 may notify processor 110 that the unit of data resides in third level cache 113. In this instance, processor 110 may begin its search in third level cache 113 and then go to fourth level cache 118. Tracking table 124 may be in near proximity to first level cache 112. Being physically near the first level may further enhance the retrieval of data, since the processor may always search for data in the first level before resorting to the tracking table.
  • In addition to enhancing the retrieval of data stored in the hierarchy, tracking table 124 may be used to determine the appropriate level for cached data. In one example, when the data is evicted from its current level (e.g., first level cache 112) processor 110 may move the unit of data to a different location that is determined based at least partially on the history of movements in tracking table 124. The new location may be determined in a variety of ways. For example, the processor may evict the data to its prior location. The history in the tracking table may be used as a predictor of the data's future usage. When a conventional processor is done using data obtained from a deep cache level, it may simply place it back in the first level, even though the data's future usage may be infrequent.
  • In another aspect, at least one victim tracking table may be used to store data evicted from tracking table 124. History associated with a unit of data may be evicted from tracking table 124 when the history is referenced less frequently by processor 110. FIG. 1 shows victim tracking table 126 and victim tracking table 128. As will be discussed further below, each unit of data may be associated with a data block in tracking table 124. In one example, processor 110 may evict a unit of data from the cache hierarchy, when the data block associated with the unit is evicted from tracking table 124 and victim tracking tables 126 and 128. As with the levels of the cache hierarchy, the tracking table and its associated victim tracking tables may be arranged in a variety of ways. For example, tracking table 124 may be arranged inside processor 110 and victim tracking tables 126 and 128 may be placed outside processor 110.
  • FIG. 1 also shows an example translation lookaside buffer (“TLB”) 122 that may be used by the processor to translate virtual addresses into physical addresses. TLB 122 may be implemented as content-addressable memory (“CAM”) such that the CAM search key is a virtual address received by a program (e.g., an operating system) and the output is a physical address of the unit of data in main memory 120. Despite being cached in the multi-level hierarchy, the physical address returned by TLB 122 may still refer to an address in main memory 120. As will be discussed in more detail below with regard to FIG. 3, this physical address may be used to find the location and location history of the unit of data in tracking table 124.
  • FIG. 2 illustrates a flow diagram that summarizes an example method 200 for enhancing the retrieval of cached objects. As shown in block 202 of FIG. 2, it is determined whether data is cached in a first level of a multi-level cache hierarchy. Referring back to FIG. 1, processor 110 may search for a unit of data in first level cache 112. Referring back to FIG. 2, if it is determined that the data is not cached in a first level cache, a search for data may be initiated from a location specified in the history, as shown in block 204. Referring back to FIG. 1, rather than searching sequentially from second level cache 114 to fourth level cache 118, processor 110 may search tracking table 124 to obtain a more precise location of the sought after data.
  • Referring now to FIG. 3, a close up illustration of tracking table 124 is shown. Here, tracking table 124 is shown having rows of data broken up into individual data blocks. In one example, the location and location history associated with each unit of data may be stored in a data block of tracking table 124 that corresponds to each unit of data. In another example, each data block in tracking table 124 associated with a given unit of data may have a similar size as that of each data block in TLB 122 associated with the same unit of data. An insert of a data block in TLB 122 for a unit of data may cause an insert of a corresponding data block in tracking table 124. This may ensure that each unit of data that the processor may potentially use is accounted for in the tracking table. The similarity in data block size between tracking table 124 and TLB 122 may facilitate the synchronicity therebetween. In a further example, each row of data in tracking table 124 may be 4 kilobytes and each data block may be 64 bytes.
  • As noted above, each row in tracking table 124 may comprise multiple data blocks. Upon receiving a virtual address from a program, processor 110 may utilize TLB 122 to translate the virtual address to a physical address in main memory 120. If the unit of data is cached, processor 110 may initially search for the unit of data in first level cache 112. If the search results in a cache miss, processor 110 may search tracking table 124 using the physical address returned from TLB 122. By way of example, physical address 301 shown in FIG. 3 may be received by tracking table 124 from processor 110. Tracking table 124 may parse physical address 301 to locate the data block that corresponds to the unit of data sought by the processor. In one example, tracking table 124 may parse tag field 302, row index field 304, and column index field 306. Tracking table 124 may concatenate tag field 302 and row index field 304 to locate the row containing the data block associated with the sought after unit of data. In the example of FIG. 3, the unit of data in question is located in row 310. The tracking table may then use column index field 306 to locate the block of data in row 310. In the example of FIG. 3, the sought after data block is data block 312. Tracking table 124 may find the location of the sought after unit of data in data block 312 and return it to processor 110. In one example, tracking table 124 may return the data's last known location (e.g., in the cache hierarchy or in main memory). Processor 110 may then use the data block offset field 308 to locate the data within a cache line of the cache level, if the location provided by the tracking table is a level in the hierarchy.
  • Referring now to FIG. 4, an example multi-processor arrangement is depicted. In the example of FIG. 4, each processor 402 and 410 has its own private tracking table 408 and 414 respectively and its own private first level cache 404 and 412 respectively. FIG. 4 further shows a second cache level 418, a third cache level 420, and a fourth cache level 422 that are shared between processor 402 and processor 410. As with the cache hierarchy shown in FIG. 1, the cache hierarchy shown in FIG. 4 may be used by processor 402 and 410 to store units of data cached from main memory 424. Processors 402 and 410 are also shown having their own region filters 406 and 416 respectively. Each region filter may monitor the cache coherence status of a unit of data and filter out unnecessary cache coherence transactions. In one example, cache coherence may be defined as the consistency of data stored in a shared resource, such as the shared cache levels shown in FIG. 4.
  • The region filters may also use the tracking tables to facilitate the location of data. In one example, a region filter may analyze a cache coherence transaction history associated with a unit of data to determine whether the data is likely in a private cache memory of a different processor. For example, region filter 406 may use a cache coherence history to determine whether a sought after unit of data is likely located in first level cache memory 412, which is exclusive to processor 410. If the region filter determines that the data is not likely in the private cache memory of the different processor, the region filter may locate the unit of data based on the coherence transaction history and the location information associated with the unit of data in the tracking table. By way of example, region filter 406 may use cache coherence transaction history and location information in tracking table 408 associated with a unit of data to determine where the unit is located in the shared cache memory levels.
  • Advantageously, the foregoing computer system, integrated circuit, and method enhance the retrieval of data stored in a cache hierarchy. Rather than searching through the cache hierarchy sequentially, the processor may be provided with a more accurate location of the data. In this regard, processor manufacturers may implement deeper cache hierarchies without being concerned about a reduction in processor performance. The techniques disclosed herein may also enhance the retrieval of cached data in multiprocessor systems. Furthermore, the tracking table may be used for improved cache level eviction such that the data is evicted to a level that better corresponds with an expected usage of the data.
  • Although the disclosure herein has been described with reference to particular examples, it is to be understood that these examples are merely illustrative of the principles of the disclosure. It is therefore to be understood that numerous modifications may be made to the examples and that other arrangements may be devised without departing from the spirit and scope of the disclosure as defined by the appended claims. Furthermore, while particular processes are shown in a specific order in the appended drawings, such processes are not limited to any particular order unless such order is expressly set forth herein; rather, processes may be performed in a different order or concurrently and steps may be added or omitted.

Claims (20)

1. A computing system comprising:
a processor;
a multi-level cache hierarchy;
a main memory coupled to the processor; and
a tracking table to maintain location information and a location history for each unit of data that moves between the main memory and levels of the cache hierarchy, the processor to begin a search for a unit of data from a location specified in the tracking table, if the data is not in a first level cache of the hierarchy.
2. The computing system of claim 1, wherein the processor to:
determine whether to evict the unit of data from a level in the hierarchy in which the unit currently resides; and
if the unit of data is to be evicted, move the unit of data to another location that is determined based at least partially on associated location information in the tracking table.
3. The computing system of claim 1, wherein the tracking table to update location information associated with the unit of data, when the data moves to a different level of the hierarchy or when the unit is evicted from the hierarchy.
4. The computing system of claim 1, wherein location information and location history associated with each unit of data is stored in a data block in the tracking table corresponding to each unit of data, the tracking table being in near proximity to the first level cache of the hierarchy.
5. The computing system of claim 4, further comprising a translation lookaside buffer, each data block in the buffer to have a similar size as that of each data block in the tracking table, the tracking table to insert the data block associated with each unit of data, when the processor inserts a translation lookaside buffer data block associated with each unit of data.
6. The computing system of claim 4, further comprising at least one victim tracking table to store data blocks evicted from the tracking table.
7. The computing system of claim 1, further comprising a region filter to:
analyze a cache coherence transaction history associated with the unit of data to determine whether the data is likely in a private cache memory of a different processor; and
if the data is not likely in the private cache memory of the different processor, locate the unit of data based on the coherence transaction history and the location information associated with the unit of data.
8. An integrated circuit comprising:
a processor;
a multi-level cache hierarchy; and
a tracking table to store tracking history for each unit of data that moves between a main memory coupled to the processor and the hierarchy, the processor to update the history when a unit of data moves to a level in the hierarchy or when the unit is evicted from the hierarchy, the processor further to initiate a search for a given unit of data from a physical location specified in the history, if an attempt to retrieve the given unit from a first level of the hierarchy results in a cache miss.
9. The integrated circuit of claim 8, wherein the processor to:
determine whether to evict the given unit of data from a given level in the hierarchy in which the unit currently resides; and
if the given unit of data is to be evicted, move the unit to another location that is determined based at least partially on the tracking history associated with the data.
10. The integrated circuit of claim 8, wherein tracking history associated with the given unit of data is stored in a data block of the tracking table corresponding to the given unit of data, the tracking table being in near proximity to the first level of the hierarchy.
11. The integrated circuit of claim 10, further comprising a translation lookaside buffer, each data block in the buffer to have a similar size as that of each data block in the tracking table, the tracking table to insert the data block associated with the given unit of data, when the processor inserts a translation lookaside buffer data block associated with the given unit of data.
12. The integrated circuit of claim 10, further comprising at least one victim tracking table to store data blocks evicted from the tracking table.
13. The integrated circuit of claim 8, further comprising:
a region filter to analyze a cache coherence transaction history associated with the given unit of data to determine whether the data is likely in a private cache memory of a different processor; and
if the data is not likely in the private cache memory of the different processor, locate the given unit of data based on the coherence transaction history and the history associated with the given unit.
14. A method comprising:
reading, using a processor, a request to retrieve a unit of data;
determining, using the processor, whether the unit of data is cached in a first level cache memory of a multi-level cache hierarchy;
if the unit of data is not cached in the first level cache memory, reading, using the processor, a tracking table comprising a history of movements made by the data between a main memory and the cache hierarchy; and
initiating, using the processor, a search for the unit of data from a location specified in the history.
15. The method of claim 14, further comprising evicting, using the processor, the unit of data from a level in the hierarchy in which the unit currently resides to a prior location specified in the history, if it is determined that the unit of data is to be evicted from the level.
16. The method of claim 14, further comprising updating, using the tracking table, the history of movements, when the unit of data moves to a level in the hierarchy or when the unit of data is evicted from the hierarchy.
17. The method of claim 14, further comprising storing, using the tracking table, the history associated with the unit of data in a data block of the tracking table corresponding to the unit of data.
18. The method of claim 17, further comprising inserting, using the tracking table, the data block associated with the unit of data, when the processor inserts a translation lookaside buffer data block associated with the unit of data, each data block in the buffer having a similar size as that of each data block in the tracking table.
19. The method of claim 17, further comprising:
moving, using the tracking table, the history to a victim tracking table, when the history is evicted from the tracking table; and
evicting, using the processor, the unit of data from the cache hierarchy, when the history associated with the unit is evicted from the tracking table and the victim tracking table.
20. The method of claim 14, further comprising:
analyzing, using a region filter, a cache coherence transaction history associated with the unit of data to determine whether the data is likely in a private cache memory of a different processor; and
if the data is not likely in the private cache memory of the different processor, locating, using the region filter, the unit of data based on the coherence transaction history and the history associated with the unit.
US13/902,366 2013-05-24 2013-05-24 Multi-level cache tracking table Active 2034-01-25 US9514044B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US13/902,366 US9514044B2 (en) 2013-05-24 2013-05-24 Multi-level cache tracking table

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US13/902,366 US9514044B2 (en) 2013-05-24 2013-05-24 Multi-level cache tracking table

Publications (2)

Publication Number Publication Date
US20140351518A1 true US20140351518A1 (en) 2014-11-27
US9514044B2 US9514044B2 (en) 2016-12-06

Family

ID=51936186

Family Applications (1)

Application Number Title Priority Date Filing Date
US13/902,366 Active 2034-01-25 US9514044B2 (en) 2013-05-24 2013-05-24 Multi-level cache tracking table

Country Status (1)

Country Link
US (1) US9514044B2 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150143047A1 (en) * 2013-11-21 2015-05-21 Green Cache AB Systems and methods for direct data access in multi-level cache memory hierarchies
US20150220570A1 (en) * 2014-02-06 2015-08-06 International Business Machines Corporation Multilevel filters for cache-efficient access
US20150347297A1 (en) * 2014-05-29 2015-12-03 Green Cache AB Systems and methods for implementing a tag-less shared cache and a larger backing cache
KR20160098973A (en) * 2015-02-11 2016-08-19 삼성전자주식회사 Computing Apparatus and Method for Cache Management
US9852060B2 (en) * 2016-03-31 2017-12-26 Dell Products L.P. Storage class memory (SCM) memory mode cache system
WO2018013824A1 (en) * 2016-07-14 2018-01-18 Advanced Micro Devices, Inc. System and method for storing cache location information for cache entry transfer
US9940356B2 (en) 2014-07-31 2018-04-10 International Business Machines Corporation Efficient join-filters for parallel processing
CN108475234A (en) * 2015-11-04 2018-08-31 三星电子株式会社 The system and method for coherent memory is built in a multi-processor system
EP3486785A1 (en) * 2017-11-20 2019-05-22 Samsung Electronics Co., Ltd. Systems and methods for efficient cacheline handling based on predictions
US20220358178A1 (en) * 2021-08-04 2022-11-10 Beijing Baidu Netcom Science Technology Co., Ltd. Data query method, electronic device, and storage medium
US11544093B2 (en) * 2019-09-27 2023-01-03 Intel Corporation Virtual machine replication and migration

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11593109B2 (en) * 2021-06-07 2023-02-28 International Business Machines Corporation Sharing instruction cache lines between multiple threads
US11593108B2 (en) 2021-06-07 2023-02-28 International Business Machines Corporation Sharing instruction cache footprint between multiple threads

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020194431A1 (en) * 2001-06-16 2002-12-19 Samsung Electronics Co., Ltd. Multi-level cache system
US20030061450A1 (en) * 2001-09-27 2003-03-27 Mosur Lokpraveen B. List based method and apparatus for selective and rapid cache flushes
US20040030950A1 (en) * 2002-08-08 2004-02-12 International Business Machines Corporation Apparatus for imprecisely tracking cache line inclusivity of a higher level cache
US20040059877A1 (en) * 2002-09-20 2004-03-25 International Business Machines Corporation Method and apparatus for implementing cache state as history of read/write shared data
US20060112233A1 (en) * 2004-11-19 2006-05-25 Ibm Corporation Enabling and disabling cache bypass using predicted cache line usage
US20100023698A1 (en) * 2008-07-22 2010-01-28 International Business Machines Corp. Enhanced Coherency Tracking with Implementation of Region Victim Hash for Region Coherence Arrays
US20120102269A1 (en) * 2010-10-21 2012-04-26 Oracle International Corporation Using speculative cache requests to reduce cache miss delays

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6078992A (en) 1997-12-05 2000-06-20 Intel Corporation Dirty line cache
US6647466B2 (en) 2001-01-25 2003-11-11 Hewlett-Packard Development Company, L.P. Method and apparatus for adaptively bypassing one or more levels of a cache hierarchy
WO2011044154A1 (en) 2009-10-05 2011-04-14 Marvell Semiconductor, Inc. Data caching in non-volatile memory

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020194431A1 (en) * 2001-06-16 2002-12-19 Samsung Electronics Co., Ltd. Multi-level cache system
US20030061450A1 (en) * 2001-09-27 2003-03-27 Mosur Lokpraveen B. List based method and apparatus for selective and rapid cache flushes
US20040030950A1 (en) * 2002-08-08 2004-02-12 International Business Machines Corporation Apparatus for imprecisely tracking cache line inclusivity of a higher level cache
US20040059877A1 (en) * 2002-09-20 2004-03-25 International Business Machines Corporation Method and apparatus for implementing cache state as history of read/write shared data
US20060112233A1 (en) * 2004-11-19 2006-05-25 Ibm Corporation Enabling and disabling cache bypass using predicted cache line usage
US20100023698A1 (en) * 2008-07-22 2010-01-28 International Business Machines Corp. Enhanced Coherency Tracking with Implementation of Region Victim Hash for Region Coherence Arrays
US20120102269A1 (en) * 2010-10-21 2012-04-26 Oracle International Corporation Using speculative cache requests to reduce cache miss delays

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Chen, Zhifeng, Yuanyuan Zhou, and Kai Li. "Eviction-based Cache Placement for Storage Caches." USENIX Annual Technical Conference, General Track. 2003. *
D. Kim , J. Ahn , J. Kim , J. Huh, Subspace snooping: Filtering snoops with operating system support, Proceedings of the The Nineteenth International Conference on Parallel Architectures and Compilation Techniques, PACT'10 September 11-15, 2010 *
J. F. Cantin, M. H. Lipasti, and J. E. Smith. Improving multiprocessor performance with coarse-grain coherence tracking. In Proceedings of the 32nd annual international symposium on Computer Architecture(ISCA), pages 246-257, Washington, DC, USA, 2005. IEEE Computer Society *
J.F. Cantin, J.E. Smith, M.H. Lipasti, A. Moshovos, and B. Falsafi, Coarse-Grain Coherence Tracking: Regionscout and Region Coherence Arrays, IEEE Micro, vol. 26, no. 1, pp. 70-79, Jan. 2006. *
M. Ekman, P. Stenstrom, and F. Dahlgren. TLB and Snoop Energy-Reduction using Virtual Caches in Low-Power Chip-Multiprocessors. In Proceedings of the 2002 international symposium on Low power electronics and design (ISLPED), pages 243-246, New York, NY, USA, 2002. ACM. *

Cited By (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150143046A1 (en) * 2013-11-21 2015-05-21 Green Cache AB Systems and methods for reducing first level cache energy by eliminating cache address tags
US10402344B2 (en) * 2013-11-21 2019-09-03 Samsung Electronics Co., Ltd. Systems and methods for direct data access in multi-level cache memory hierarchies
US20150143047A1 (en) * 2013-11-21 2015-05-21 Green Cache AB Systems and methods for direct data access in multi-level cache memory hierarchies
US10671543B2 (en) * 2013-11-21 2020-06-02 Samsung Electronics Co., Ltd. Systems and methods for reducing first level cache energy by eliminating cache address tags
US9734170B2 (en) * 2014-02-06 2017-08-15 International Business Machines Corporation Multilevel filters for cache-efficient access
US20150220570A1 (en) * 2014-02-06 2015-08-06 International Business Machines Corporation Multilevel filters for cache-efficient access
US20150220573A1 (en) * 2014-02-06 2015-08-06 International Business Machines Corporation Multilevel filters for cache-efficient access
US9740714B2 (en) * 2014-02-06 2017-08-22 International Business Machines Corporation Multilevel filters for cache-efficient access
US20150347297A1 (en) * 2014-05-29 2015-12-03 Green Cache AB Systems and methods for implementing a tag-less shared cache and a larger backing cache
US20150347302A1 (en) * 2014-05-29 2015-12-03 Green Cache AB Management of shared pipeline resource usage based on level information
US10019368B2 (en) 2014-05-29 2018-07-10 Samsung Electronics Co., Ltd. Placement policy for memory hierarchies
US10031849B2 (en) * 2014-05-29 2018-07-24 Samsung Electronics Co., Ltd. Tracking alternative cacheline placement locations in a cache hierarchy
US20150347298A1 (en) * 2014-05-29 2015-12-03 Green Cache AB Tracking alternative cacheline placement locations in a cache hierarchy
US10409725B2 (en) * 2014-05-29 2019-09-10 Samsung Electronics Co., Ltd. Management of shared pipeline resource usage based on level information
US10402331B2 (en) * 2014-05-29 2019-09-03 Samsung Electronics Co., Ltd. Systems and methods for implementing a tag-less shared cache and a larger backing cache
US9940356B2 (en) 2014-07-31 2018-04-10 International Business Machines Corporation Efficient join-filters for parallel processing
US9946748B2 (en) 2014-07-31 2018-04-17 International Business Machines Corporation Efficient join-filters for parallel processing
KR20160098973A (en) * 2015-02-11 2016-08-19 삼성전자주식회사 Computing Apparatus and Method for Cache Management
KR102516882B1 (en) * 2015-02-11 2023-04-17 삼성전자주식회사 Computing Apparatus and Method for Cache Management
CN108475234A (en) * 2015-11-04 2018-08-31 三星电子株式会社 The system and method for coherent memory is built in a multi-processor system
US20180329819A1 (en) * 2015-11-04 2018-11-15 Samsung Electronics Co., Ltd. Systems and methods for implementing coherent memory in a multiprocessor system
US10754777B2 (en) * 2015-11-04 2020-08-25 Samsung Electronics Co., Ltd. Systems and methods for implementing coherent memory in a multiprocessor system
US11237969B2 (en) 2015-11-04 2022-02-01 Samsung Electronics Co., Ltd. Systems and methods for implementing coherent memory in a multiprocessor system
US11615026B2 (en) 2015-11-04 2023-03-28 Samsung Electronics Co., Ltd. Systems and methods for implementing coherent memory in a multiprocessor system
US10394710B2 (en) 2016-03-31 2019-08-27 Dell Products L.P. Storage class memory (SCM) memory mode cache system
US9852060B2 (en) * 2016-03-31 2017-12-26 Dell Products L.P. Storage class memory (SCM) memory mode cache system
WO2018013824A1 (en) * 2016-07-14 2018-01-18 Advanced Micro Devices, Inc. System and method for storing cache location information for cache entry transfer
US10956339B2 (en) 2016-07-14 2021-03-23 Advanced Micro Devices, Inc. System and method for storing cache location information for cache entry transfer
EP3486785A1 (en) * 2017-11-20 2019-05-22 Samsung Electronics Co., Ltd. Systems and methods for efficient cacheline handling based on predictions
US11138121B2 (en) 2017-11-20 2021-10-05 Samsung Electronics Co., Ltd. Systems and methods for efficient cacheline handling based on predictions
US11544093B2 (en) * 2019-09-27 2023-01-03 Intel Corporation Virtual machine replication and migration
US20220358178A1 (en) * 2021-08-04 2022-11-10 Beijing Baidu Netcom Science Technology Co., Ltd. Data query method, electronic device, and storage medium

Also Published As

Publication number Publication date
US9514044B2 (en) 2016-12-06

Similar Documents

Publication Publication Date Title
US9514044B2 (en) Multi-level cache tracking table
US11314647B2 (en) Methods and systems for managing synonyms in virtually indexed physically tagged caches
US10860323B2 (en) Method and apparatus for processing instructions using processing-in-memory
CN107111455B (en) Electronic processor architecture and method of caching data
US9251095B2 (en) Providing metadata in a translation lookaside buffer (TLB)
US8209499B2 (en) Method of read-set and write-set management by distinguishing between shared and non-shared memory regions
CN109240945B (en) Data processing method and processor
US9411733B2 (en) Sharing pattern-based directory coherence for multicore scalability (“SPACE”)
US9697137B2 (en) Filtering translation lookaside buffer invalidations
EP2710472B1 (en) Memory with metadata stored in a portion of the memory pages
US11023376B2 (en) System and methods for efficient virtually-tagged cache implementation
US7454573B2 (en) Cost-conscious pre-emptive cache line displacement and relocation mechanisms
US20140281180A1 (en) Data coherency management
Franey et al. Tag tables
WO2015075673A4 (en) Systems and methods for reducing first level cache energy by eliminating cache address tags
CN112753024B (en) External memory based translation look-aside buffer
US20160140042A1 (en) Instruction cache translation management
US20190155747A1 (en) Performing maintenance operations
Chou et al. CANDY: Enabling coherent DRAM caches for multi-node systems
US10831673B2 (en) Memory address translation
US20160140043A1 (en) Instruction ordering for in-progress operations
US10853262B2 (en) Memory address translation using stored key entries
US8473686B2 (en) Computer cache system with stratified replacement
CN107562806B (en) Self-adaptive sensing acceleration method and system of hybrid memory file system
US9639467B2 (en) Environment-aware cache flushing mechanism

Legal Events

Date Code Title Description
AS Assignment

Owner name: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P., TEXAS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:CHANG, JICHUAN;YOON, DOE HYUN;RANGANATHAN, PARTHASARATHY;REEL/FRAME:030485/0087

Effective date: 20130522

AS Assignment

Owner name: HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP, TEXAS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P.;REEL/FRAME:037079/0001

Effective date: 20151027

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8