WO2024066195A1 - Cache management method and apparatus, cache apparatus, electronic apparatus, and medium - Google Patents
Cache management method and apparatus, cache apparatus, electronic apparatus, and medium Download PDFInfo
- Publication number
- WO2024066195A1 WO2024066195A1 PCT/CN2023/078664 CN2023078664W WO2024066195A1 WO 2024066195 A1 WO2024066195 A1 WO 2024066195A1 CN 2023078664 W CN2023078664 W CN 2023078664W WO 2024066195 A1 WO2024066195 A1 WO 2024066195A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- memory bank
- cache
- cache line
- memory
- far
- Prior art date
Links
- 238000007726 management method Methods 0.000 title claims abstract description 65
- 230000015654 memory Effects 0.000 claims abstract description 321
- 230000032683 aging Effects 0.000 claims description 19
- 238000012545 processing Methods 0.000 claims description 15
- 230000005012 migration Effects 0.000 claims description 6
- 238000013508 migration Methods 0.000 claims description 6
- 230000000717 retained effect Effects 0.000 claims description 3
- 238000000034 method Methods 0.000 abstract description 18
- 238000010586 diagram Methods 0.000 description 16
- 238000013507 mapping Methods 0.000 description 12
- 210000000352 storage cell Anatomy 0.000 description 4
- 230000001360 synchronised effect Effects 0.000 description 4
- 101100233916 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) KAR5 gene Proteins 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000003068 static effect Effects 0.000 description 3
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 2
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 2
- 101100012902 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) FIG2 gene Proteins 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 239000000872 buffer Substances 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000005055 memory storage Effects 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 101000827703 Homo sapiens Polyphosphoinositide phosphatase Proteins 0.000 description 1
- 102100023591 Polyphosphoinositide phosphatase Human genes 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/084—Multiuser, multiprocessor or multiprocessing cache systems with a shared cache
Definitions
- Embodiments of the present disclosure relate to a cache management method, a cache device, a cache management device, an electronic device, and a computer-readable storage medium.
- At least one embodiment of the present disclosure provides a cache management method for a shared cache shared by multiple processor cores, the cache management method comprising: allocating a respective near memory bank and far memory bank to each processor core; for each processor core's memory access request, preferentially accessing the corresponding near memory bank, and then accessing the corresponding far memory bank.
- At least one embodiment of the present disclosure also provides a cache device, including: a shared cache for sharing by multiple processor cores, the shared cache including multiple storage bodies, and a cache management unit configured to allocate a respective near storage body and a far storage body to each processor core, and to allow a memory access request to each processor core to preferentially access the corresponding near storage body, and then access the corresponding far storage body.
- At least one embodiment of the present disclosure further provides a cache management device, including: a processor; and a memory storing computer executable instructions, which implement the cache management method provided by at least one embodiment of the present disclosure when executed by the processor.
- At least one embodiment of the present disclosure further provides an electronic device, including a cache, a cache device provided by at least one embodiment of the present disclosure, and a plurality of processor cores.
- At least one embodiment of the present disclosure further provides a computer-readable storage medium for non-transiently storing computer-executable instructions, which implement the cache management method provided by at least one embodiment of the present disclosure when executed by a processor.
- the cache management method provided by the embodiment of the present disclosure divides the shared cache into a near storage body and a far storage body, which can reduce the delay caused by the increased physical delay due to the size of the shared cache and improve the performance.
- FIG1 shows a schematic diagram of the structure of a multi-core processor system
- FIG2 is a schematic diagram showing the mapping relationship between memory and cache in direct associative, fully associative and set associative;
- FIG3 is a schematic diagram showing a set-associative organization and addressing mode of a cache
- FIG4 shows a schematic flow chart of a cache management method provided by at least one embodiment of the present disclosure
- FIG5 is a schematic diagram showing a mapping relationship between a near memory bank and a far memory bank in a private cache and a shared cache according to an embodiment
- FIG. 6 is a schematic flow chart showing an example of step S402 in FIG. 4 ;
- FIG. 7 is a schematic flow chart showing another example of step S402 in FIG. 4 ;
- FIG8A shows a schematic flow chart of a cache management method for a read request
- FIG8B shows a schematic flow chart of a cache management method for write-back requests
- FIG8C is a schematic block diagram showing an example of cache line migration
- FIG9A shows a schematic block diagram of a cache device provided by at least one embodiment of the present disclosure
- FIG9B shows a schematic structural diagram of a cache device provided by at least one embodiment of the present disclosure
- FIG10 shows a schematic diagram of a cache management device according to an embodiment of the present disclosure.
- FIG. 11 shows a schematic diagram of an electronic device according to an embodiment of the present disclosure.
- FIG1 shows a multi-core processor system, which is a centralized shared memory system, where processing cores core0, core1, core2 and core3 have their own dedicated caches (private caches), one or more levels of shared caches (usually the last level cache (LLC)), and share the same main memory and input/output (I/O).
- the dedicated cache of each processing core may include a first level cache (L1 cache) or a second level cache (L2 cache), etc.
- the capacity of the cache is usually very small, the content stored in the cache is only a subset of the content of the main memory, and the data exchange between the cache and the main memory is in blocks.
- a certain function must be applied to locate the main memory address to the cache, which is called address mapping.
- address mapping After the data in the main memory is cached into the cache according to this mapping relationship, when the central processing unit (CPU) executes the program, it will transform the main memory address in the program into the cache address.
- the address mapping methods of the cache usually include direct mapping, fully associative mapping, and set associative mapping.
- the main function of the cache is to store data that the processor may need to access frequently in the near future. In this way, the processor can directly read data from the cache without frequently accessing the slower main memory, thereby improving the processor's access speed to the main memory.
- the basic unit of cache is a cache block or cache line. Similar to the division of cache into multiple cache lines, the data stored in the main memory is also similarly divided. The divided data blocks in the main memory are called main memory blocks. Generally, the size of a main memory block can be 4KB, and the size of a cache line can also be 4KB. It is understandable that in actual applications, the size of the main memory block and the cache line can also be set to other values, as long as the size of the main memory block is the same as the size of the cache line.
- mapping relationship between the main memory and the cache which can be direct associative, fully associative, and set associative.
- direct associative, fully associative, and set associative the mapping relationship between the main memory and the cache is shown in Figure 2. Divide the main memory and the cache into blocks of the same size. Assume that the main memory has 32 items and the cache has 8 items. In the direct associative method, each main memory block can only be placed in a cache line in the cache. Assume that the 12th block of the main memory is to be placed in the cache.
- the hardware required for the direct associative method is simple but inefficient, as shown in Figure 2 (a). In the fully associative method, each main memory block can be placed in any position of the cache, so that the 4th, 12th, 20th, and 28th main memory blocks can be placed in the cache at the same time.
- the hardware required for the fully associative method is complex but efficient, as shown in Figure 2(b).
- Set associativity is a compromise between direct associativity and full associativity.
- positions 0, 2, 4, and 6 in the cache are one way (here called way 0), and positions 1, 3, 5, and 7 are another way (here called way 1), with 4 blocks in each way.
- block No. 12 in the main memory because the remainder of 12 divided by 4 is 0, block No. 12 can be placed in position No. 0 of way 0 of the cache (i.e. position No. 0 of the cache), or in position No. 0 of way 1 (i.e. position No. 1 of the cache), as shown in Figure 2(c).
- the set-associative organization and addressing mode of the cache in (c) of FIG2 can be further illustrated by the example of FIG3.
- the cache is organized in the form of a cache line array.
- a column of cache lines constitutes the same way, and multiple cache lines at the same position in multiple columns of cache lines constitute a set, so the cache lines of the same set are in different ways, that is, they are ranged by different ways.
- the location of the data or instruction in the cache is obtained by the physical address of the data or instruction to be read, and each physical address (for example, it can include multiple bits, such as 32 bits, according to the specifications of the system) is divided into three parts:
- ⁇ Tag used to select a specific cache line in a group.
- the tag of the physical address is compared with the tag of each cache line. If they match, it is a cache hit, thus selecting this cache line. Otherwise, it is a cache miss.
- ⁇ Offset which is used to select the corresponding address in the cache line. It indicates the first byte of the physical address in the cache line, and the corresponding data or instruction is read from the position of this byte.
- the working principle of the cache requires it to store the latest or most frequently used data as much as possible.
- a cache line is transferred from the main memory to the cache and the available positions in the cache are already occupied, the cache data replacement problem will arise. Solving this problem involves the data replacement mechanism of the cache system.
- the data replacement mechanism of the cache system includes two steps:
- LRU Least Recently Used
- LFU Least Frequently Used
- MRU Most Recently Used
- NRU Not Recent Used
- SRRIP Static RRIP
- the cache includes a large number of storage cells, each of which is used to store a data bit. These storage cells are physically arranged in an array, and each storage cell is accessed through word lines and bit lines. All storage cells in each cache are divided and organized into multiple sub-arrays for easy access, and each sub-array is called a bank. For example, input buffers and output buffers can be provided for each bank to facilitate access (reading, writing, etc.); for example, different banks can also be accessed in parallel at the same time. For example, for the above-mentioned set-associative situation, multiple cache lines in the same way can be physically located in different banks.
- At least one embodiment of the present disclosure provides a cache management method for a shared cache shared by multiple processor cores, the cache management method comprising: allocating a respective near memory bank and far memory bank to each processor core; for each processor core's memory access request, preferentially accessing the corresponding near memory bank, and then accessing the corresponding far memory bank.
- the cache management method provided by the above-mentioned embodiment of the present disclosure divides the shared cache into a near storage body and a far storage body, reduces the delay caused by the increased physical delay due to the size of the shared cache, and improves the performance.
- At least one embodiment of the present disclosure also provides a cache device, a cache management device, an electronic device, and a computer-readable storage medium corresponding to the above-mentioned cache management method.
- Fig. 4 shows a schematic flow chart of a cache management method provided by at least one embodiment of the present disclosure.
- the cache management method is used for a shared cache shared by multiple processor cores, and the shared cache includes multiple storage bodies.
- the cache management method includes the following steps S401 - S402 .
- Step S401 Allocate a respective near memory bank and far memory bank to each processor core.
- Step S402 For each processor core's memory access request, the corresponding near memory bank is accessed first, and then the corresponding far memory bank is accessed.
- the access latency of a processor core to a corresponding near memory bank is less than the access latency to a corresponding far memory bank.
- the "near” and “far” here refer to each processor core. Therefore, a memory bank in the cache that is a near memory bank (or far memory bank) for one processor core may not be a near memory bank (or far memory bank) for another processor core.
- FIG. 5 is a schematic diagram showing a mapping relationship between a near memory bank and a far memory bank in a private cache and a shared cache according to an embodiment.
- the shared cache includes multiple memory banks (near memory bank 0, far memory bank 0, far memory bank 1, etc.). Both the private cache and the shared cache have a “way-set” structure, and the cache lines in the private cache and the cache lines in the memory banks in the shared cache have the same size.
- a cache line in a certain way in the same set in the private cache of a processor core may correspond to a shared cache.
- the shared cache may be a cache line in a certain way in different groups in different banks.
- the first cache line in the same group and the same way in the private cache may correspond to a cache line in the near bank 0 and a cache line in the far bank 0.
- the second cache line in the same group and the same way in the private cache may correspond to a cache line in the near bank 0 and a cache line in the far bank 1.
- the embodiments of the present disclosure are not limited to the above exemplary correspondence.
- the near memory bank is accessed first, and then the far memory bank is accessed.
- both the near memory bank and the far memory bank can be accessed.
- each processor core mainly accesses its own near memory bank.
- the latency of memory access is minimal and does not depend on the address mapping relationship. It is limited to ensure that all memory accesses of the processor core are concentrated in the near memory bank. For memory accesses of multiple processor cores, they are also concentrated in the near memory bank with the closest physical location. Whether it is a single processor core or multiple processor cores, latency can be reduced and performance can be improved.
- FIG. 6 shows a schematic flow chart of an example of step S402 in FIG. 4 .
- step S402 may include the following steps S601 - S602 .
- Step S601 operating the corresponding near memory bank according to the physical address of the read request.
- step S601 may include: when the read request hits in the corresponding near memory bank, returning the data in the near memory bank to the processor core.
- step S601 may further include: when the read request hits the corresponding near storage body, invalidating the copy in the corresponding near storage body, and updating the directory of the shared cache.
- the processor core sends a read request to the shared cache.
- the index information of the physical address of the read request corresponds to a near storage body. It accesses the near storage body and performs a search operation through the tag information of the physical address.
- the tag of the physical address is compared with the tag of each cache line in the near storage body. If they match, the cache hits, so this cache line is selected, and the data in this cache line is returned to the processor core.
- the copy in the near storage body is invalidated, and then the directory in the shared cache is updated.
- Step S602 when the read request does not hit in the near memory bank, the read request is routed to the corresponding far memory bank according to the physical address of the read request, and the corresponding far memory bank is operated.
- the index information of the physical address corresponds to a far memory bank
- the far memory bank is accessed, and a search operation is performed through the tag information of the physical address, and the access_farBank_flag (access far memory bank flag) is marked.
- step S602 may include: when the read request hits the corresponding far memory bank, returning the data in the corresponding far memory bank to the processor core.
- step S602 may also include: retaining a copy in a far storage body, adding aging information of a cache line corresponding to the data stored in a corresponding far storage body, and updating a directory of a shared cache.
- the cache management method provided by the embodiment of the present disclosure may further include step S603: when the read request does not hit in the corresponding near storage body and the corresponding far storage body, operating on other processor cores by checking the directory of the shared cache.
- step S603 may include: if the data to be found exists in other processor cores, returning the data in the other processor cores to the processor core that issued the read request, and updating the directory in the shared cache; if the data to be found does not exist in other processor cores, sending a read request to the memory to obtain the data to be found.
- a read request when a read request does not hit the corresponding near storage body and the corresponding far storage body, it is possible to check the directory in the shared cache to determine whether other processor cores have the data requested by the read request. If the requested data exists in other processor cores, the data can be returned to the processor core that issued the read request through a core to core transfer method to update the directory in the shared cache. If the requested data is not available in other processor cores, the read request needs to be sent to the memory to obtain the requested data.
- FIG. 7 is a schematic flowchart showing another example of step S402 in FIG. 4 .
- step S402 may include the following steps S701 - S702 .
- Step S701 operating the corresponding near memory bank according to the physical address of the write-back request.
- step S701 may include: in a case where the write-back request hits in the corresponding near storage body, updating the state stored in the corresponding near storage body; in a case where the write-back request does not hit in the corresponding near storage body, setting the first victim cache line in the corresponding near storage body through a replacement algorithm.
- the processor core issues a write-back request to the shared cache.
- the index information of the physical address of the write-back request it corresponds to a near memory bank.
- the near memory bank is accessed and a search operation is performed through the tag information of the physical address.
- the tag of the physical address is compared with the tag of each cache line in the near memory bank. If a match occurs, the cache hits.
- This cache line is called the first victim cache line.
- the decision of which cache line is sacrificed is controlled by the replacement algorithm.
- the first victim cache line needs to be set in the corresponding near memory bank through the replacement algorithm.
- storing the first sacrifice cache line in the corresponding near storage body through a replacement algorithm may include: in a case where there is a free cache line in the corresponding near storage body to store the first sacrifice cache line, storing the first sacrifice cache line in the corresponding near storage body, and updating the directory of the shared cache; in a case where there is no free cache line in the corresponding near storage body to store the first sacrifice cache line, generating a second sacrifice cache line in the corresponding near storage body, performing a migration operation on the second sacrifice cache line, marking a flag indicating write back to the far storage body, sending the second sacrifice cache line and the aging information corresponding to the second sacrifice cache line to the corresponding far storage body, and updating the directory of the shared cache.
- the first victim cache line when there is an idle cache line in the corresponding near storage body to store the first victim cache line, the first victim cache line is directly set in the near storage body, and the shared cache directory is updated.
- a second victim cache line is generated in the near storage body, the first victim cache line is set at the second victim cache line, and the second victim cache line is migrated within the shared cache.
- the victim_far_flag signal (write back to far storage body flag) needs to be marked, and the second victim cache line and its corresponding aging information are sent together. To the remote storage body, and at the same time update the directory update caused by the write-back operation generated by the previous processor core.
- the parameter of the replacement algorithm is aging information for the least recently used algorithm (LRU) and is usage frequency information for the least frequently used algorithm (LFU).
- LRU least recently used algorithm
- LFU usage frequency information for the least frequently used algorithm
- Step S702 when a cache line migration operation is generated in the corresponding near memory bank, an operation is performed on the corresponding far memory bank according to address information of a first victim cache line of the corresponding near memory bank.
- step S702 may include: in a case where a write-back request hits in a corresponding far storage body, updating the state of a cache line in the corresponding far storage body; in a case where a write-back request does not hit in the corresponding far storage body, determining whether to select a third victim cache line in the corresponding far storage body to be written back to the memory according to a replacement algorithm.
- a far storage body is corresponding, the far storage body is accessed and a search operation is performed through the tag information of the physical address, and the tag of the physical address is compared with the tag of each cache line in the far storage body. If a match is made, the cache hits, indicating that the second victim cache line already exists in the far storage body, so only the state stored in the far storage body needs to be updated. In the case of a miss, a replacement algorithm needs to be used to determine whether a third victim cache line needs to be generated in the corresponding far storage body and written back to the memory.
- determining whether to select a third sacrifice cache line in the corresponding far storage body and write it back to the memory according to the replacement algorithm may include: if the replacement algorithm shows that there are free cache lines available in the corresponding far storage body, storing the second sacrifice cache line in the corresponding far storage body; if the replacement algorithm shows that there are no free cache lines available in the corresponding far storage body, writing the second sacrifice cache line or the third sacrifice cache line back to the memory.
- the second sacrifice cache line is directly set in the far memory bank.
- a third sacrifice cache line is generated in the far memory bank, and it is determined whether to write the second sacrifice cache line or the third sacrifice cache line into the memory. The following three methods can be used to determine which sacrifice cache line is written into the memory, and the embodiments of the present disclosure are not limited to the following three methods.
- Method 1 For the LRU replacement algorithm, the aging value of the second victim cache line is compared with the aging value of the third victim cache line, and the second victim cache line and the third victim cache line with the smaller aging value are preferentially selected. The large cache line is written back to the memory, and if the aging value of the second victim cache line is equal to the aging value of the third victim cache line, the second victim cache line is written back to the memory.
- Method 2 Prioritize writing back the second victim cache line or the third victim cache line through register configuration.
- Method three Check the directory of the shared cache, and if the second victim cache line exists in the near memory bank corresponding to the other processor core, write the second victim cache line in the near memory bank corresponding to the other processor core back to the memory.
- FIG8A is a schematic flow chart showing a cache management method for a read request.
- the processor core issues a read request, and preferentially accesses the corresponding near storage body according to the physical address of the read request, and determines whether the read request hits in the corresponding near storage body. If it hits, the copy in the corresponding near storage body is invalidated, and the data in the near storage body is returned to the processor core, and then the directory of the shared cache is updated. If it does not hit, access_farBank_flag is set to 1, and the read request is sent to the corresponding far storage body. Then it is determined whether the read request hits in the corresponding far storage body.
- the copy in the far storage body is retained, the aging (age) information of the cache line stored in the far storage body is increased, and the data in the corresponding far storage body is returned to the processor core, and then the directory of the shared cache is updated. If it does not hit, it is determined whether the read request hits in other processor cores by checking the directory of the shared cache. If it hits, the data is returned to the processor core that issued the read request through the transfer mode from processor core to processor core, and then the directory in the shared cache is updated. If it does not hit, the read request is sent to the memory (an example of the system memory) to obtain the requested data.
- the memory an example of the system memory
- FIG8B shows a schematic flow chart of a cache management method for write-back requests.
- the processor core sends a write-back request to the corresponding near storage body to determine whether the write-back request hits in the corresponding near storage body. If it hits, the directory of the shared cache is updated. If it misses, the first sacrifice cache line is set in the corresponding near storage body through the replacement algorithm, and it is determined whether to generate a second sacrifice cache line. If there is an idle cache line in the corresponding near storage body to store the first sacrifice cache line, the second sacrifice cache line is not generated, and the directory of the shared cache is updated.
- a second sacrifice cache line is generated, and then the second sacrifice cache line is migrated to the corresponding far storage body, and the write-back request is routed to the corresponding far storage body. Then it is determined whether the second sacrifice cache line hits in the corresponding far storage body. If it hits, the write-back request is completed. If it misses, it is determined whether a third sacrifice cache line will be generated in the corresponding far storage body according to the replacement algorithm. If in the corresponding far storage body If there is an idle cache line available for use, the third victim cache line is not generated, and it is determined whether the second victim cache line is written back to the memory.
- the second victim cache line is written back to the memory, and if no, the second victim cache line is set in the corresponding far memory bank. If there is no idle cache line available for use in the corresponding far memory bank, the third victim cache line is generated, the second victim cache line is set in the corresponding far memory bank, and the third victim cache line is written back to the memory.
- FIG. 8C is a schematic block diagram showing an example of cache line migration.
- Memory bank 0 is the near memory bank of Core0 and the far memory bank of Core1/2/3; memory bank 1 is the near memory bank of Core1 and the far memory bank of Core0/2/3; memory bank 2 is the near memory bank of Core2 and the far memory bank of Core0/1/3; memory bank 3 is the near memory bank of Core3 and the far memory bank of Core0/1/2.
- Core0 reads data from the corresponding near memory bank (memory bank 0)
- Core1 reads data from the corresponding far memory bank (memory bank 2)
- Core2 reads data from other processor cores (Core3)
- Core3 reads data from the memory.
- copy A in memory bank 0 is migrated to Core0
- copy C in memory bank 2 is migrated to Core1 and memory bank 2 retains copy C
- copy D in Core3 is migrated to Core2 and Core3 retains copy D
- copy E in memory is migrated to Core3.
- Core0, Core1, and Core3 generate write-back requests respectively.
- the victim cache line (copy A) in Core0 is written back to memory bank 0.
- the victim cache line (copy C) in Core1 is written back to memory bank 1, the victim cache line (copy B) in memory bank 1 is migrated to the far memory bank (memory bank 2) corresponding to Core1, and the victim cache line (copy C) in memory bank 2 is written back to the memory.
- the victim cache line F in the near memory bank (memory bank 3) corresponding to Core3 is migrated to the corresponding far memory bank (memory bank 0), and the victim cache line F is written back to the memory from memory bank 0.
- FIG. 9A shows a schematic block diagram of a cache device 900 provided by at least one embodiment of the present disclosure.
- the cache device can be used to execute the cache management method shown in FIG. 4 .
- the cache device 900 includes a shared cache 901 for multiple processor cores to share and a cache management unit 902.
- the cache management unit 902 includes a near storage receiving component 903, a far storage receiving component 904, a near storage pipeline control component 905, a far storage pipeline control component 906,
- the near bank returns the result component 907 and the far bank returns the result component 908.
- the shared cache 901 includes a plurality of banks.
- the cache management unit 902 is configured to allocate a respective near memory bank and far memory bank to each processor core, and to make the memory access request to each processor core access the corresponding near memory bank first, and then access the corresponding far memory bank.
- the access latency of the processor core to the corresponding near memory bank is shorter than the access latency to the corresponding far memory bank.
- the near storage receiving component 903 is configured to receive a memory access request sent to a corresponding near storage.
- the far memory bank receiving component 904 is configured to receive a memory access request sent to a corresponding far memory bank.
- the near bank pipeline control component 905 is configured to determine the processing mode of the memory access request received by the corresponding near bank and whether it hits in the corresponding near bank, and execute a replacement algorithm for the corresponding near bank.
- the far memory bank pipeline control component 906 is configured to determine the processing mode of the memory access request received by the corresponding far memory bank and whether it hits in the corresponding far memory bank, and execute a replacement algorithm for the corresponding far memory bank.
- the near storage return result component 907 is configured to return the result required by the processor core to the processor core.
- the far storage return result component 908 is configured to return the result required by the processor core to the processor core.
- the cache device 900 has the same technical effects as the cache management method shown in FIG. 4 , which will not be described in detail herein.
- FIG. 9B shows a schematic structural diagram of a cache device 910 provided by at least one embodiment of the present disclosure.
- the near storage receiving component 911 receives the memory access request sent by the processor to the near storage, and the near storage receiving component 911 sends the memory access request to the near storage pipeline control component 912.
- the near storage pipeline control component 912 is connected to the near storage storage component 913, the near storage return result component 914, and the far storage receiving component 915.
- the near storage result return component 914 is responsible for returning the result to the processor.
- the far storage receiving component 915 receives the memory access request from the near storage pipeline control component 912, and sends the memory access request to the far storage pipeline control component 916.
- the far storage pipeline control component 916 is connected to the far storage storage component 917, the far storage return result component 918, and the memory 919 connection.
- the far storage return result component 918 can read data from the memory 919 and is responsible for returning the result to the processor.
- the near memory receiving component and the far memory receiving component can be implemented in hardware such as a queue or a FIFO (First In First Out) queue, and the present disclosure does not limit this.
- the near memory storage component and the far memory storage component are used to store cache line information, which can be in the form of a static random access memory (SRAM), a dynamic random access memory (DRAM), etc., and the present disclosure does not limit this.
- the memory can be on-chip storage or off-chip storage, and the present disclosure does not limit this.
- the cache device may be implemented using hardware, software, firmware, or any feasible combination thereof, and the present disclosure is not limited thereto.
- At least one embodiment of the present disclosure further provides a cache management device, comprising: a memory for non-temporarily storing computer executable instructions; and a processor for executing the computer executable instructions, wherein the computer executable instructions, when executed by the processor, execute the cache management method provided by at least one embodiment of the present disclosure.
- FIG10 shows a schematic diagram of a cache management device 1000 according to an embodiment of the present disclosure.
- the cache management device 1000 may include a processing device 1001 and a memory 1002 , which may be interconnected via a bus 1003 .
- the processing device 1001 can perform various actions and processes according to the program or code stored in the memory 1002.
- the processing device 1001 can be an integrated circuit chip with signal processing capabilities.
- the above-mentioned processing device can be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components.
- DSP digital signal processor
- ASIC application-specific integrated circuit
- FPGA field-programmable gate array
- the various methods, steps, processes and logic block diagrams disclosed in the embodiments of the present disclosure can be implemented or executed.
- the general-purpose processor can be a microprocessor or the processor can also be any conventional processor, etc., which can be an X86 architecture or an ARM architecture, etc.
- the memory 1002 stores computer executable instructions, wherein the computer executable instructions implement the cache management method provided by at least one embodiment of the present disclosure when executed by the processing device 1001.
- the memory 1002 may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memories.
- the non-volatile memory may be a read-only memory (ROM), a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEROM), or an electrically erasable programmable read-only memory (EEROM).
- the volatile memory may be a random access memory (RAM) that is used as an external cache.
- RAM static random access memory
- DRAM dynamic random access memory
- SDRAM synchronous dynamic random access memory
- DDRSDRAM double data rate synchronous dynamic random access memory
- ESDRAM enhanced synchronous dynamic random access memory
- SLDRAM synchronous linked dynamic random access memory
- DRRAM direct main memory bus random access memory
- At least one embodiment of the present disclosure further provides an electronic device, including a cache and a cache device provided by at least one embodiment of the present disclosure and multiple processor cores.
- the electronic device is, for example, a central processing unit, and the processor is, for example, a single-core or multi-core processor.
- the electronic device is a computer system, and the computer system includes one or more processors,
- Fig. 11 shows a schematic diagram of an electronic device 1100 according to an embodiment of the present disclosure.
- the electronic device 1100 according to an embodiment of the present disclosure may include a cache device 900 , a cache 1101 , and a plurality of Cores 1102 .
- At least one embodiment of the present disclosure provides a computer-readable storage medium for non-transitory storage of computer-executable instructions, which implement the cache management method provided by at least one embodiment of the present disclosure when executed by a processor.
- the computer-readable storage medium in the embodiments of the present disclosure may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory. It should be noted that the memory of the methods described herein is intended to include, but is not limited to, these and any other suitable types of memory.
- the embodiment of the present disclosure also provides a computer program product or a computer program, which includes a computer instruction, and the computer instruction is stored in a computer-readable storage medium.
- the processor of the computer device reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the computer device executes the cache management method according to the embodiment of the present disclosure.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
A cache management method, a cache apparatus, a cache management apparatus, an electronic apparatus, and a computer-readable storage medium. The cache management method is used for a shared cache shared by a plurality of processor cores, the cache management method comprising: allocating a near memory bank and a far memory bank to each processor core; and for a memory access request of each processor core, first accessing the corresponding near memory bank, and then accessing the corresponding far memory bank. In the method, the shared cache is divided into the near memory banks and the far memory banks, thereby reducing the delay caused by the increased physical delay due to the size of the shared cache, and improving performance.
Description
本申请要求于2022年9月27日递交的中国专利申请第202211183443.X号的优先权,在此全文引用上述中国专利申请公开的内容以作为本申请的一部分。This application claims priority to Chinese Patent Application No. 202211183443.X filed on September 27, 2022. The contents of the above-mentioned Chinese patent application disclosure are hereby cited in their entirety as a part of this application.
本公开的实施例涉及一种缓存管理方法、缓存装置、缓存管理装置、电子装置和计算机可读存储介质。Embodiments of the present disclosure relate to a cache management method, a cache device, a cache management device, an electronic device, and a computer-readable storage medium.
在多核处理器设计中,访存操作是影响性能的一大因素,为提升处理器性能,当前多采用高速缓存(cache)技术来降低延迟,提升性能。然而,由于芯片尺寸的限制,处理器核内的高速缓存的容量有限,只能满足一部分访存操作的需要,进而人们提出了在核外增加容量较大的高速缓存,作为多个处理核之间共享的存储单元,即共享缓存,由此来降低访存延迟,提升性能。In the design of multi-core processors, memory access operations are a major factor affecting performance. To improve processor performance, cache technology is currently used to reduce latency and improve performance. However, due to the limitation of chip size, the capacity of the cache in the processor core is limited and can only meet the needs of some memory access operations. Therefore, people have proposed adding a larger cache outside the core as a storage unit shared between multiple processing cores, that is, a shared cache, to reduce memory access latency and improve performance.
发明内容Summary of the invention
本公开至少一个实施例提供一种缓存管理方法,用于多个处理器核共享的共享缓存,缓存管理方法包括:对每个处理器核分配各自的近存储体和远存储体;对每个处理器核的访存请求,优先访问对应的近存储体,再访问对应的远存储体。At least one embodiment of the present disclosure provides a cache management method for a shared cache shared by multiple processor cores, the cache management method comprising: allocating a respective near memory bank and far memory bank to each processor core; for each processor core's memory access request, preferentially accessing the corresponding near memory bank, and then accessing the corresponding far memory bank.
本公开至少一个实施例还提供一种缓存装置,包括:用于多个处理器核共享的共享缓存,共享缓存包括多个存储体,缓存管理单元,配置为对每个处理器核分配各自的近存储体和远存储体,且使得对每个处理器核的访存请求优先访问对应的近存储体,再访问对应的远存储体。At least one embodiment of the present disclosure also provides a cache device, including: a shared cache for sharing by multiple processor cores, the shared cache including multiple storage bodies, and a cache management unit configured to allocate a respective near storage body and a far storage body to each processor core, and to allow a memory access request to each processor core to preferentially access the corresponding near storage body, and then access the corresponding far storage body.
本公开至少一个实施例还提供一种缓存管理装置,包括:处理器;以及存储器,存储有计算机可执行指令,计算机可执行指令在被处理器执行时实现本公开至少一个实施例提供的缓存管理方法。
At least one embodiment of the present disclosure further provides a cache management device, including: a processor; and a memory storing computer executable instructions, which implement the cache management method provided by at least one embodiment of the present disclosure when executed by the processor.
本公开至少一个实施例还提供一种电子装置,包括缓存和本公开至少一个实施例提供的缓存装置以及多个处理器核。At least one embodiment of the present disclosure further provides an electronic device, including a cache, a cache device provided by at least one embodiment of the present disclosure, and a plurality of processor cores.
本公开至少一个实施例还提供一种计算机可读存储介质,用于非瞬时性地存储计算机可执行指令,计算机可执行指令在被处理器执行时实现本公开至少一个实施例提供的缓存管理方法。At least one embodiment of the present disclosure further provides a computer-readable storage medium for non-transiently storing computer-executable instructions, which implement the cache management method provided by at least one embodiment of the present disclosure when executed by a processor.
本公开的实施例提供的缓存管理方法将共享缓存分为近存储体和远存储体,可以降低由于共享缓存的尺寸而增加的物理延迟所带来的延迟,提高性能。The cache management method provided by the embodiment of the present disclosure divides the shared cache into a near storage body and a far storage body, which can reduce the delay caused by the increased physical delay due to the size of the shared cache and improve the performance.
为了更清楚地说明本公开实施例的技术方案,下面将对实施例的附图作简单地介绍,显而易见地,下面描述中的附图仅仅涉及本公开的一些实施例,而非对本公开的限制。In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings of the embodiments will be briefly introduced below. Obviously, the drawings in the following description only relate to some embodiments of the present disclosure, rather than limiting the present disclosure.
图1示出了一种多核处理器体系的结构示意图;FIG1 shows a schematic diagram of the structure of a multi-core processor system;
图2示出了直接相联、全相联和组相联中内存和缓存的映射关系的示意图;FIG2 is a schematic diagram showing the mapping relationship between memory and cache in direct associative, fully associative and set associative;
图3示出了缓存的组相联的组织形式和寻址方式的示意图;FIG3 is a schematic diagram showing a set-associative organization and addressing mode of a cache;
图4示出了本公开至少一个实施例提供的一种缓存管理方法的示意性流程图;FIG4 shows a schematic flow chart of a cache management method provided by at least one embodiment of the present disclosure;
图5示出了一个实施例涉及的私有缓存与共享缓存中的近存储体和远存储体的映射关系的示意图;FIG5 is a schematic diagram showing a mapping relationship between a near memory bank and a far memory bank in a private cache and a shared cache according to an embodiment;
图6示出了图4中步骤S402的一个示例的示意性流程图;FIG. 6 is a schematic flow chart showing an example of step S402 in FIG. 4 ;
图7示出了图4中步骤S402的另一示例的示意性流程图;FIG. 7 is a schematic flow chart showing another example of step S402 in FIG. 4 ;
图8A示出了一种针对读请求的缓存管理方法的流程示意图;FIG8A shows a schematic flow chart of a cache management method for a read request;
图8B示出了一种针对写回请求的缓存管理方法的流程示意图;FIG8B shows a schematic flow chart of a cache management method for write-back requests;
图8C示出了一种缓存行迁移的示例的示意框图;FIG8C is a schematic block diagram showing an example of cache line migration;
图9A示出了本公开至少一实施例提供的一种缓存装置的示意框图;FIG9A shows a schematic block diagram of a cache device provided by at least one embodiment of the present disclosure;
图9B示出了本公开至少一实施例提供的一种缓存装置的结构示意图;FIG9B shows a schematic structural diagram of a cache device provided by at least one embodiment of the present disclosure;
图10示出了根据本公开实施例的缓存管理装置的示意图;以及FIG10 shows a schematic diagram of a cache management device according to an embodiment of the present disclosure; and
图11示出了根据本公开实施例的电子装置的示意图。
FIG. 11 shows a schematic diagram of an electronic device according to an embodiment of the present disclosure.
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例的附图,对本公开实施例的技术方案进行清楚、完整地描述。显然,所描述的实施例是本公开的一部分实施例,而不是全部的实施例。基于所描述的本公开的实施例,本领域普通技术人员在无需创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。In order to make the purpose, technical solution and advantages of the embodiments of the present disclosure clearer, the technical solution of the embodiments of the present disclosure will be clearly and completely described below in conjunction with the drawings of the embodiments of the present disclosure. Obviously, the described embodiments are part of the embodiments of the present disclosure, not all of the embodiments. Based on the described embodiments of the present disclosure, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present disclosure.
除非另外定义,本公开使用的技术术语或者科学术语应当为本公开所属领域内具有一般技能的人士所理解的通常意义。本公开中使用的“第一”、“第二”以及类似的词语并不表示任何顺序、数量或者重要性,而只是用来区分不同的组成部分。同样,“一个”、“一”或者“该”等类似词语也不表示数量限制,而是表示存在至少一个。“包括”或者“包含”等类似的词语意指出现该词前面的元件或者物件涵盖出现在该词后面列举的元件或者物件及其等同,而不排除其他元件或者物件。“连接”或者“相连”等类似的词语并非限定于物理的或者机械的连接,而是可以包括电性的连接,不管是直接的还是间接的。“上”、“下”、“左”、“右”等仅用于表示相对位置关系,当被描述对象的绝对位置改变后,则该相对位置关系也可能相应地改变。Unless otherwise defined, the technical terms or scientific terms used in the present disclosure should be understood by people with ordinary skills in the field to which the present disclosure belongs. The "first", "second" and similar words used in the present disclosure do not indicate any order, quantity or importance, but are only used to distinguish different components. Similarly, similar words such as "one", "one" or "the" do not indicate quantity restrictions, but indicate that there is at least one. Similar words such as "include" or "comprise" mean that the elements or objects appearing before the word cover the elements or objects listed after the word and their equivalents, without excluding other elements or objects. Similar words such as "connect" or "connected" are not limited to physical or mechanical connections, but can include electrical connections, whether direct or indirect. "Up", "down", "left", "right" and the like are only used to indicate relative positional relationships. When the absolute position of the described object changes, the relative positional relationship may also change accordingly.
图1示出了一种多核处理器体系,该体系为集中式共享存储器体系,处理核core0、core1、core2和core3具有各自的专用缓存(私有缓存),具有一级或多级共享缓存(通常是最后一级缓存(LLC)),并且共享同一主存以及输入/输出(I/O)。每个处理核的专用缓存可以包括一级缓存(L1cache)或二级缓存(L2cache)等。FIG1 shows a multi-core processor system, which is a centralized shared memory system, where processing cores core0, core1, core2 and core3 have their own dedicated caches (private caches), one or more levels of shared caches (usually the last level cache (LLC)), and share the same main memory and input/output (I/O). The dedicated cache of each processing core may include a first level cache (L1 cache) or a second level cache (L2 cache), etc.
例如,通常缓存的容量很小,缓存保存的内容只是主存内容的一个子集,且缓存与主存的数据交换是以块为单位的。为了把主存中的数据缓存到缓存中,必须应用某种函数把主存地址定位到缓存中,这称为地址映射。在将主存中的数据按这种映射关系缓存到缓存中后,中央处理器(central processing unit,CPU)执行程序时,会将程序中的主存地址变换成缓存地址。缓存的地址映射方式通常有直接映射、全相联和组相联映射。For example, the capacity of the cache is usually very small, the content stored in the cache is only a subset of the content of the main memory, and the data exchange between the cache and the main memory is in blocks. In order to cache the data in the main memory into the cache, a certain function must be applied to locate the main memory address to the cache, which is called address mapping. After the data in the main memory is cached into the cache according to this mapping relationship, when the central processing unit (CPU) executes the program, it will transform the main memory address in the program into the cache address. The address mapping methods of the cache usually include direct mapping, fully associative mapping, and set associative mapping.
虽然缓存的容量相较于主存来说较小,但是速度相较于主存来说却快的多,因此缓存的主要功能是用来存储近期处理器可能需要频繁访问到的数据。
这样处理器便可以直接到缓存中进行数据读取,而无需频繁地访问速度较慢的主存,以此来提高处理器对主存的访问速度。缓存的基本单位是缓存块或缓存行(cache line)。与缓存分成多个缓存行类似,主存中存储的数据也进行了类似划分。主存中的划分出来的数据块称为主存块。通常,一个主存块的大小可以为4KB,一个缓存行的大小也可以为4KB。可以理解的是,实际应用中,还可以将主存块和缓存行的大小设置为其他值,仅需保证主存块的大小与缓存行的大小相同即可。Although the cache capacity is smaller than the main memory, its speed is much faster than the main memory. Therefore, the main function of the cache is to store data that the processor may need to access frequently in the near future. In this way, the processor can directly read data from the cache without frequently accessing the slower main memory, thereby improving the processor's access speed to the main memory. The basic unit of cache is a cache block or cache line. Similar to the division of cache into multiple cache lines, the data stored in the main memory is also similarly divided. The divided data blocks in the main memory are called main memory blocks. Generally, the size of a main memory block can be 4KB, and the size of a cache line can also be 4KB. It is understandable that in actual applications, the size of the main memory block and the cache line can also be set to other values, as long as the size of the main memory block is the same as the size of the cache line.
主存和缓存之间具有一定映射关系,该映射关系可以是直接相联、全相联和组相联。在直接相联、全相联和组相联中,主存和缓存的映射关系原理如图2所示。将主存和缓存都分为大小一样的块。假设主存有32项,缓存有8项。在直接相联方式中,每个主存块只能放到缓存中的一个缓存行的位置上。假设要把主存的第12号块放到缓存中,因为缓存只有8项,所以只能放在第(12mod 8=4)项上,其他地方都不能放;由此可知第4、12、20、28号主存块都对应到缓存的第4项上,如果冲突了就只能替换。直接相联方式所需的硬件简单但效率低,如图2的(a)所示。在全相联方式中,每个主存块都可以放到缓存的任一位置上,这样第4、12、20、28号主存块可以同时放入缓存中。全相联方式所需的硬件复杂但效率高,如图2的(b)所示。组相联是直接相联和全相联的折中。以两路(way)组相联为例,缓存中第0、2、4、6号位置为一路(这里称为第0路),第1、3、5、7号位置为另一路(这里称为第1路),每路4个块。对于主存的第12号块,因为12除以4余数为0,所以既可以把第12号块放到缓存的第0路的第0号位置(即缓存的第0号位置),也可以放到第1路的第0号位置(即缓存的第1号位置),如图2的(c)所示。There is a certain mapping relationship between the main memory and the cache, which can be direct associative, fully associative, and set associative. In direct associative, fully associative, and set associative, the mapping relationship between the main memory and the cache is shown in Figure 2. Divide the main memory and the cache into blocks of the same size. Assume that the main memory has 32 items and the cache has 8 items. In the direct associative method, each main memory block can only be placed in a cache line in the cache. Assume that the 12th block of the main memory is to be placed in the cache. Since the cache has only 8 items, it can only be placed in the (12mod 8=4)th item and cannot be placed anywhere else; it can be seen that the 4th, 12th, 20th, and 28th main memory blocks all correspond to the 4th item of the cache. If there is a conflict, they can only be replaced. The hardware required for the direct associative method is simple but inefficient, as shown in Figure 2 (a). In the fully associative method, each main memory block can be placed in any position of the cache, so that the 4th, 12th, 20th, and 28th main memory blocks can be placed in the cache at the same time. The hardware required for the fully associative method is complex but efficient, as shown in Figure 2(b). Set associativity is a compromise between direct associativity and full associativity. Taking two-way set associativity as an example, positions 0, 2, 4, and 6 in the cache are one way (here called way 0), and positions 1, 3, 5, and 7 are another way (here called way 1), with 4 blocks in each way. For block No. 12 in the main memory, because the remainder of 12 divided by 4 is 0, block No. 12 can be placed in position No. 0 of way 0 of the cache (i.e. position No. 0 of the cache), or in position No. 0 of way 1 (i.e. position No. 1 of the cache), as shown in Figure 2(c).
图2的(c)中缓存的组相联的组织形式和寻址方式可以采用图3的示例进行进一步说明。如图3所示,缓存被组织为缓存行数组的形式。一列缓存行组成同一路,多列缓存行中相同位置的多个缓存行组成一个组(set),因此同一组的缓存行在不同的路中,即通过不同路被范围。通过要读取的数据或指令的物理地址获取数据或指令在缓存中的位置,每个物理地址(例如根据系统的规格可以包括多个位,例如32位)被分为三部分:The set-associative organization and addressing mode of the cache in (c) of FIG2 can be further illustrated by the example of FIG3. As shown in FIG3, the cache is organized in the form of a cache line array. A column of cache lines constitutes the same way, and multiple cache lines at the same position in multiple columns of cache lines constitute a set, so the cache lines of the same set are in different ways, that is, they are ranged by different ways. The location of the data or instruction in the cache is obtained by the physical address of the data or instruction to be read, and each physical address (for example, it can include multiple bits, such as 32 bits, according to the specifications of the system) is divided into three parts:
●索引(Index),用于选择缓存中的组,同一组中的所有缓存行通过索
引来选择;● Index, used to select a group in the cache. All cache lines in the same group are indexed. Induce choice;
●标签(tag),用于选择一组中特定的缓存行,将物理地址的标签与每个缓存行的标签进行比较,如果匹配,则缓存命中(cache hit),从而选择此缓存行,否则缓存缺失(cache miss);● Tag, used to select a specific cache line in a group. The tag of the physical address is compared with the tag of each cache line. If they match, it is a cache hit, thus selecting this cache line. Otherwise, it is a cache miss.
●偏移量(offset),用于在缓存行中选择相应的地址,其表示物理地址在缓存行中的第一个字节,对应的数据或指令从此字节的位置读取。●Offset, which is used to select the corresponding address in the cache line. It indicates the first byte of the physical address in the cache line, and the corresponding data or instruction is read from the position of this byte.
缓存的工作原理要求它尽量保存最新或最常用的数据,当从主存向缓存传送一个缓存行,而缓存中可用位置已经被占满时,就会产生缓存数据的替换问题。解决这个问题就涉及到缓存系统的数据替换机制。简单来说,缓存系统的数据替换机制包括两步:The working principle of the cache requires it to store the latest or most frequently used data as much as possible. When a cache line is transferred from the main memory to the cache and the available positions in the cache are already occupied, the cache data replacement problem will arise. Solving this problem involves the data replacement mechanism of the cache system. In short, the data replacement mechanism of the cache system includes two steps:
第一,在缓存中筛选出对应用访问来说“不重要”的数据;First, filter out the data in the cache that is “unimportant” to application access;
第二,将这些数据从缓存中删除,为新来的数据腾出空间,对于具有脏(dirty)属性的数据,还需要写回到主存中。Second, delete the data from the cache to make room for new data. For data with dirty attributes, it also needs to be written back to the main memory.
已有的替换算法可以包括LRU(Least Recently Used)、LFU(Least Frequently Used)、MRU(Most Recently Used)、NRU(Not Recent Used)、SRRIP(Static RRIP)等。Existing replacement algorithms may include LRU (Least Recently Used), LFU (Least Frequently Used), MRU (Most Recently Used), NRU (Not Recent Used), SRRIP (Static RRIP), etc.
缓存包括大量存储单元(cell),每个存储单元用于存储一个数据位(bit),这些存储单元在物理上被排列成阵列,且通过字线和位线访问每个存储单元。每个缓存中的全部存储单元会被划分、组织为多个子阵列以便于访问,每个子阵列被称为存储体(bank)。例如,可以为每个存储体提供输入缓冲和输出缓冲以便于进行访问(读、写等);例如不同的存储体还可以被同时并行访问。例如,对于上述组相联的情形,同一路中的多个缓存行可以物理上位于不同的存储体中。The cache includes a large number of storage cells, each of which is used to store a data bit. These storage cells are physically arranged in an array, and each storage cell is accessed through word lines and bit lines. All storage cells in each cache are divided and organized into multiple sub-arrays for easy access, and each sub-array is called a bank. For example, input buffers and output buffers can be provided for each bank to facilitate access (reading, writing, etc.); for example, different banks can also be accessed in parallel at the same time. For example, for the above-mentioned set-associative situation, multiple cache lines in the same way can be physically located in different banks.
对于多核处理器而言,当只有单核工作时,可以使用私有缓存以及共享缓存的所有资源,当单核的访存操作访问共享缓存时,若地址映射到远端的共享缓存,则额外的物理延迟会降低大容量带来的性能提升,降低单核的性能。同样地,在多核工作时,可以使用私有缓存以及部分共享缓存的资源,如果每个核对共享缓存的访存操作都映射到远端的物理区域(例如存储体),这些额外的物理延迟同样也会降低多核的性能。随着高性能的要求提升,核外的高速缓存的容量则要求更大,同样也意味着共享缓存的面积更大。当尺
寸增加到一定程度后,引入的额外的物理延迟会抵消一部分高速缓存带来的性能提升。For multi-core processors, when only a single core is working, all resources of the private cache and shared cache can be used. When the single core's memory access operation accesses the shared cache, if the address is mapped to the remote shared cache, the additional physical delay will reduce the performance improvement brought by the large capacity and reduce the performance of the single core. Similarly, when multiple cores are working, private cache and part of the shared cache resources can be used. If each core's memory access operation to the shared cache is mapped to a remote physical area (such as a storage body), these additional physical delays will also reduce the performance of the multiple cores. As the demand for high performance increases, the capacity of the cache outside the core is required to be larger, which also means a larger area for the shared cache. When the size When the size increases to a certain extent, the additional physical delay introduced will offset part of the performance improvement brought by the cache.
本公开至少一个实施例提供一种缓存管理方法,用于多个处理器核共享的共享缓存,该缓存管理方法包括:对每个处理器核分配各自的近存储体和远存储体;对每个处理器核的访存请求,优先访问对应的近存储体,再访问对应的远存储体。At least one embodiment of the present disclosure provides a cache management method for a shared cache shared by multiple processor cores, the cache management method comprising: allocating a respective near memory bank and far memory bank to each processor core; for each processor core's memory access request, preferentially accessing the corresponding near memory bank, and then accessing the corresponding far memory bank.
本公开上述实施例提供的缓存管理方法将共享缓存分为近存储体和远存储体,降低由于共享缓存的尺寸而增加的物理延迟所带来的延迟,提高性能。The cache management method provided by the above-mentioned embodiment of the present disclosure divides the shared cache into a near storage body and a far storage body, reduces the delay caused by the increased physical delay due to the size of the shared cache, and improves the performance.
本公开至少一实施例还提供对应于上述缓存管理方法的缓存装置、缓存管理装置、电子装置和计算机可读存储介质。At least one embodiment of the present disclosure also provides a cache device, a cache management device, an electronic device, and a computer-readable storage medium corresponding to the above-mentioned cache management method.
下面结合附图对本公开的实施例进行详细说明,但是本公开并不限于这些具体的实施例。The embodiments of the present disclosure are described in detail below with reference to the accompanying drawings, but the present disclosure is not limited to these specific embodiments.
图4示出了本公开至少一个实施例提供的一种缓存管理方法的示意性流程图。该缓存管理方法用于多个处理器核共享的共享缓存,共享缓存包括多个存储体。Fig. 4 shows a schematic flow chart of a cache management method provided by at least one embodiment of the present disclosure. The cache management method is used for a shared cache shared by multiple processor cores, and the shared cache includes multiple storage bodies.
如图4所示,该缓存管理方法包括如下的步骤S401~S402。As shown in FIG. 4 , the cache management method includes the following steps S401 - S402 .
步骤S401:对每个处理器核分配各自的近存储体和远存储体。Step S401: Allocate a respective near memory bank and far memory bank to each processor core.
步骤S402:对每个处理器核的访存请求,优先访问对应的近存储体,再访问对应的远存储体。Step S402: For each processor core's memory access request, the corresponding near memory bank is accessed first, and then the corresponding far memory bank is accessed.
例如,处理器核对于对应的近存储体的访问延迟小于对于对应的远存储体的访问延迟。需要说明的是,这里的“近”和“远”是针对每个处理器核而言的,因此,缓存中的一个存储体对于一个处理器核为近存储体(或远存储体)未必对于另一个处理器核仍然是近存储体(或远存储体)。For example, the access latency of a processor core to a corresponding near memory bank is less than the access latency to a corresponding far memory bank. It should be noted that the "near" and "far" here refer to each processor core. Therefore, a memory bank in the cache that is a near memory bank (or far memory bank) for one processor core may not be a near memory bank (or far memory bank) for another processor core.
图5示出了一个实施例涉及的私有缓存与共享缓存中的近存储体和远存储体的映射关系的示意图。FIG. 5 is a schematic diagram showing a mapping relationship between a near memory bank and a far memory bank in a private cache and a shared cache according to an embodiment.
如图5所示,共享缓存包括多个存储体(近存储体0、远存储体0、远存储体1等)。私有缓存和共享缓存都具有“路-组(way-set)结构”,并且私有缓存中的缓存行和共享缓存中的存储体中的缓存行具有相同的大小。例如,某个处理器核的私有缓存中的同一个组中某一路中的缓存行可以对应于共
享缓存中的不同存储体中不同组中某一路中的缓存行。例如,私有缓存中同一组且同一路中的第一缓存行可以对应于近存储体0中的缓存行以及远存储体0中的缓存行。又例如,私有缓存中同一组且同一路中的第二缓存行可以对应于近存储体0中的缓存行以及远存储体1中的缓存行。本公开的实施例不限于上述示例性的对应关系。As shown in FIG5 , the shared cache includes multiple memory banks (near memory bank 0, far memory bank 0, far memory bank 1, etc.). Both the private cache and the shared cache have a “way-set” structure, and the cache lines in the private cache and the cache lines in the memory banks in the shared cache have the same size. For example, a cache line in a certain way in the same set in the private cache of a processor core may correspond to a shared cache. The shared cache may be a cache line in a certain way in different groups in different banks. For example, the first cache line in the same group and the same way in the private cache may correspond to a cache line in the near bank 0 and a cache line in the far bank 0. For another example, the second cache line in the same group and the same way in the private cache may correspond to a cache line in the near bank 0 and a cache line in the far bank 1. The embodiments of the present disclosure are not limited to the above exemplary correspondence.
例如,对于每个处理器核的访存请求,都优先访问近存储体,再访问远存储体。在单个处理器核工作的情况下,可以访问近存储体和远存储体。在多个处理器核工作的情况下,每个处理器核主要访问各自的近存储体。For example, for each processor core's memory access request, the near memory bank is accessed first, and then the far memory bank is accessed. When a single processor core is working, both the near memory bank and the far memory bank can be accessed. When multiple processor cores are working, each processor core mainly accesses its own near memory bank.
对于单个处理器核的访存请求,访存的延迟最少,不依赖于地址的映射关系,有限保证处理器核的所有访存都集中在近存储体。对于多个处理器核的访存也都集中在物理位置最近的近存储体。无论是单个处理器核还是多个处理器核,都可以降低延迟,提升性能。For memory access requests of a single processor core, the latency of memory access is minimal and does not depend on the address mapping relationship. It is limited to ensure that all memory accesses of the processor core are concentrated in the near memory bank. For memory accesses of multiple processor cores, they are also concentrated in the near memory bank with the closest physical location. Whether it is a single processor core or multiple processor cores, latency can be reduced and performance can be improved.
图6示出了图4中步骤S402的一个示例的示意性流程图。FIG. 6 shows a schematic flow chart of an example of step S402 in FIG. 4 .
如图6所示,针对作为访存请求的读请求,步骤S402的一个示例可以包括如下的步骤S601~S602。As shown in FIG. 6 , for a read request that is a memory access request, an example of step S402 may include the following steps S601 - S602 .
步骤S601:根据读请求的物理地址对对应的近存储体进行操作。Step S601: operating the corresponding near memory bank according to the physical address of the read request.
例如,在本公开的一些实施例中,步骤S601可以包括:在读请求在对应的近存储体中命中的情况下,将近存储体中的数据返回给处理器核。For example, in some embodiments of the present disclosure, step S601 may include: when the read request hits in the corresponding near memory bank, returning the data in the near memory bank to the processor core.
例如,在本公开的一些实施例中,步骤S601还可以包括:在读请求在对应的近存储体中命中的情况下,无效对应的近存储体中的副本,更新共享缓存的目录。For example, in some embodiments of the present disclosure, step S601 may further include: when the read request hits the corresponding near storage body, invalidating the copy in the corresponding near storage body, and updating the directory of the shared cache.
例如,处理器核发出读请求到共享缓存中,根据此读请求的物理地址的索引信息对应到一个近存储体,访问该近存储体并通过物理地址的标签(tag)信息进行查找操作,将物理地址的标签与近存储体中的每个缓存行的标签进行比较,如果匹配,则缓存命中,从而选择此缓存行,将此缓存行里的数据返回给处理器核,同时无效掉近存储体中的副本,再更新共享缓存中的目录。这样做的优点是,由于近存储体的资源有限,可以让出近存储体中的存储位置,用来存储其它数据。For example, the processor core sends a read request to the shared cache. According to the index information of the physical address of the read request, it corresponds to a near storage body. It accesses the near storage body and performs a search operation through the tag information of the physical address. The tag of the physical address is compared with the tag of each cache line in the near storage body. If they match, the cache hits, so this cache line is selected, and the data in this cache line is returned to the processor core. At the same time, the copy in the near storage body is invalidated, and then the directory in the shared cache is updated. The advantage of this is that since the resources of the near storage body are limited, the storage location in the near storage body can be given up to store other data.
步骤S602:在读请求在近存储体中未命中的情况下,根据读请求的物理地址将读请求路由到对应的远存储体,并对对应的远存储体进行操作。
Step S602: when the read request does not hit in the near memory bank, the read request is routed to the corresponding far memory bank according to the physical address of the read request, and the corresponding far memory bank is operated.
例如,如果物理地址的标签与近存储体中的所有缓存行的标签均不匹配,则缓存未命中,此时通过物理地址的索引信息对应到一个远存储体,访问该远存储体并通过物理地址的标签信息进行查找操作,并且标记access_farBank_flag(访问远存储体标志)。For example, if the tag of the physical address does not match the tags of all cache lines in the near memory bank, a cache miss occurs. At this time, the index information of the physical address corresponds to a far memory bank, the far memory bank is accessed, and a search operation is performed through the tag information of the physical address, and the access_farBank_flag (access far memory bank flag) is marked.
例如,在本公开的一些实施例中,步骤S602可以包括:在读请求在对应的远存储体中命中的情况下,将对应的远存储体中的数据返回给处理器核。For example, in some embodiments of the present disclosure, step S602 may include: when the read request hits the corresponding far memory bank, returning the data in the corresponding far memory bank to the processor core.
例如,在本公开的一些实施例中,步骤S602还可以包括:保留远存储体中的副本,增加数据对应的缓存行在对应的远存储体中存储的老化信息,更新共享缓存的目录。For example, in some embodiments of the present disclosure, step S602 may also include: retaining a copy in a far storage body, adding aging information of a cache line corresponding to the data stored in a corresponding far storage body, and updating a directory of a shared cache.
例如,在读请求在近存储体中未命中而访问远存储体时,同样需要通过物理地址的标签信息进行查找操作。将物理地址的标签与远存储体中的每个缓存行的标签进行比较,如果匹配,则缓存命中,从而选择此缓存行,将此缓存行里的数据返回给处理器核,并且保留远存储体中的副本,增加此缓存行在远存储体中存储的老化(age)信息,更新共享缓存的目录。这里保留副本的原因是,此远存储体同样也可能作为其它处理器核的近存储体,其它处理器核对此远存储体具有优先使用权。当访问此远存储体时,不对其存储信息进行干扰,保持此远存储体作为某个处理器核的近存储体的存储信息。For example, when a read request misses in the near memory and accesses the far memory, it is also necessary to perform a search operation through the tag information of the physical address. The tag of the physical address is compared with the tag of each cache line in the far memory. If they match, it is a cache hit, so this cache line is selected, the data in this cache line is returned to the processor core, and the copy in the far memory is retained, the aging information of this cache line stored in the far memory is increased, and the shared cache directory is updated. The reason for retaining the copy here is that this far memory may also be used as the near memory of other processor cores, and other processor cores have priority to use this far memory. When accessing this far memory, its storage information is not interfered with, and the storage information of this far memory as the near memory of a certain processor core is maintained.
例如,本公开的实施例提供的缓存管理方法还可以包括步骤S603:在读请求在对应的近存储体和对应的远存储体中均未命中的情况下,通过查看共享缓存的目录对其它处理器核进行操作。For example, the cache management method provided by the embodiment of the present disclosure may further include step S603: when the read request does not hit in the corresponding near storage body and the corresponding far storage body, operating on other processor cores by checking the directory of the shared cache.
例如,在本公开的一些实施例中,步骤S603可以包括:在其它处理器核中存在需要查找的数据的情况下,将其它处理器核中的数据返回给发出读请求的处理器核,更新共享缓存中的目录;在其它处理器核中不存在需要查找的数据的情况下,将读请求发送到内存以获取需要查找的数据。For example, in some embodiments of the present disclosure, step S603 may include: if the data to be found exists in other processor cores, returning the data in the other processor cores to the processor core that issued the read request, and updating the directory in the shared cache; if the data to be found does not exist in other processor cores, sending a read request to the memory to obtain the data to be found.
例如,在读请求在对应的近存储体和对应的远存储体中均未命中的情况下,可以通过查看共享缓存中的目录,判断是否有别的处理器核存在该读请求所请求的数据。如果其它处理器核中存在所请求的数据,则可以通过处理器核到处理器核的转移(core to core transfer)方式将数据返回给发出读请求的处理器核,更新共享缓存中的目录。如果其它处理器核中也没有所请求的数据,则需要将读请求发送到内存中以获取所请求的数据。
For example, when a read request does not hit the corresponding near storage body and the corresponding far storage body, it is possible to check the directory in the shared cache to determine whether other processor cores have the data requested by the read request. If the requested data exists in other processor cores, the data can be returned to the processor core that issued the read request through a core to core transfer method to update the directory in the shared cache. If the requested data is not available in other processor cores, the read request needs to be sent to the memory to obtain the requested data.
图7示出了图4中步骤S402的另一示例的示意性流程图。FIG. 7 is a schematic flowchart showing another example of step S402 in FIG. 4 .
如图7所示,针对访存请求中的写回请求,步骤S402的另一示例可以包括如下的步骤S701~S702。As shown in FIG. 7 , for a write-back request in a memory access request, another example of step S402 may include the following steps S701 - S702 .
步骤S701:根据写回请求的物理地址对对应的近存储体进行操作。Step S701: operating the corresponding near memory bank according to the physical address of the write-back request.
例如,在本公开的一些实施例中,步骤S701可以包括:在写回请求在对应的近存储体中命中的情况下,更新对应的近存储体中存储的状态;在写回请求在对应的近存储体中未命中的情况下,通过替换算法将第一牺牲缓存行设置在对应的近存储体中。For example, in some embodiments of the present disclosure, step S701 may include: in a case where the write-back request hits in the corresponding near storage body, updating the state stored in the corresponding near storage body; in a case where the write-back request does not hit in the corresponding near storage body, setting the first victim cache line in the corresponding near storage body through a replacement algorithm.
例如,处理器核发出写回请求到共享缓存中,根据此写回请求的物理地址的索引信息对应到一个近存储体,访问该近存储体并通过物理地址的标签信息进行查找操作,将物理地址的标签与近存储体中的每个缓存行的标签进行比较,如果匹配,则缓存命中,此缓存行被称为第一牺牲缓存行(victim cacheline),决定哪个缓存行被牺牲是由替换算法来控制的。在写回请求在对应的近存储体中命中的情况下,说明近存储体中已经存在了第一牺牲缓存行,因此只需要更新近存储体中存储的状态。在未命中的情况下,需要通过替换算法将此第一牺牲缓存行设置在对应的近存储体中。For example, the processor core issues a write-back request to the shared cache. According to the index information of the physical address of the write-back request, it corresponds to a near memory bank. The near memory bank is accessed and a search operation is performed through the tag information of the physical address. The tag of the physical address is compared with the tag of each cache line in the near memory bank. If a match occurs, the cache hits. This cache line is called the first victim cache line. The decision of which cache line is sacrificed is controlled by the replacement algorithm. In the case where the write-back request hits in the corresponding near memory bank, it means that the first victim cache line already exists in the near memory bank, so only the state stored in the near memory bank needs to be updated. In the case of a miss, the first victim cache line needs to be set in the corresponding near memory bank through the replacement algorithm.
例如,在本公开的一些实施例中,通过替换算法将第一牺牲缓存行存储在对应的近存储体中可以包括:在对应的近存储体中存在空闲的缓存行存储第一牺牲缓存行的情况下,将第一牺牲缓存行存储在对应的近存储体中,更新共享缓存的目录;在对应的近存储体中没有空闲的缓存行存储第一牺牲缓存行的情况下,对应的近存储体产生一个第二牺牲缓存行,对第二牺牲缓存行进行迁移操作,标记指示写回远存储体的标志,将第二牺牲缓存行和第二牺牲缓存行对应的老化信息一起发送给对应的远存储体,更新共享缓存的目录。For example, in some embodiments of the present disclosure, storing the first sacrifice cache line in the corresponding near storage body through a replacement algorithm may include: in a case where there is a free cache line in the corresponding near storage body to store the first sacrifice cache line, storing the first sacrifice cache line in the corresponding near storage body, and updating the directory of the shared cache; in a case where there is no free cache line in the corresponding near storage body to store the first sacrifice cache line, generating a second sacrifice cache line in the corresponding near storage body, performing a migration operation on the second sacrifice cache line, marking a flag indicating write back to the far storage body, sending the second sacrifice cache line and the aging information corresponding to the second sacrifice cache line to the corresponding far storage body, and updating the directory of the shared cache.
例如,在对应的近存储体中存在空闲的缓存行存储第一牺牲缓存行时,直接将第一牺牲缓存行设置在近存储体中,并更新共享缓存的目录。在对应的近存储体中没有空闲的缓存行存储第一牺牲缓存行时,则在近存储体中产生一个第二牺牲缓存行,将第一牺牲缓存行设置在第二牺牲缓存行处,并对第二牺牲缓存行进行共享缓存内部的迁移(migrate),需标记victim_far_flag信号(写回远存储体标志),将第二牺牲缓存行和其对应的老化信息一并发送
给远存储体,同时更新之前处理器核产生的写回操作所引起的目录的更新。需要说明的是替换算法的参数对于最近最少使用算法(LRU)为老化信息,对于最不经常使用算法(LFU)为使用频率信息。在本公开的实施例中,以老化信息为例,本公开对此不作限制。For example, when there is an idle cache line in the corresponding near storage body to store the first victim cache line, the first victim cache line is directly set in the near storage body, and the shared cache directory is updated. When there is no idle cache line in the corresponding near storage body to store the first victim cache line, a second victim cache line is generated in the near storage body, the first victim cache line is set at the second victim cache line, and the second victim cache line is migrated within the shared cache. The victim_far_flag signal (write back to far storage body flag) needs to be marked, and the second victim cache line and its corresponding aging information are sent together. To the remote storage body, and at the same time update the directory update caused by the write-back operation generated by the previous processor core. It should be noted that the parameter of the replacement algorithm is aging information for the least recently used algorithm (LRU) and is usage frequency information for the least frequently used algorithm (LFU). In the embodiment of the present disclosure, aging information is taken as an example, and the present disclosure does not limit this.
步骤S702:在对应的近存储体产生缓存行的迁移操作的情况下,根据对应的近存储体的第一牺牲缓存行的地址信息对对应的远存储体进行操作。Step S702: when a cache line migration operation is generated in the corresponding near memory bank, an operation is performed on the corresponding far memory bank according to address information of a first victim cache line of the corresponding near memory bank.
例如,在本公开的一些实施例中,步骤S702可以包括:在写回请求在对应的远存储体中命中的情况下,更新对应的远存储体中的缓存行的状态;在写回请求在对应的远存储体中未命中的情况下,根据替换算法判断是否将在对应的远存储体中选择一个第三牺牲缓存行写回到内存中。For example, in some embodiments of the present disclosure, step S702 may include: in a case where a write-back request hits in a corresponding far storage body, updating the state of a cache line in the corresponding far storage body; in a case where a write-back request does not hit in the corresponding far storage body, determining whether to select a third victim cache line in the corresponding far storage body to be written back to the memory according to a replacement algorithm.
例如,根据近存储体的写回请求的物理地址的索引信息对应到一个远存储体,访问该远存储体并通过物理地址的标签信息进行查找操作,将物理地址的标签与远存储体中的每个缓存行的标签进行比较,如果匹配,则缓存命中,说明远存储体中已经存在了第二牺牲缓存行,因此只需要更新远存储体中存储的状态。在未命中的情况下,需要通过替换算法判断是否要在对应的远存储体中产生一个第三牺牲缓存行写回到内存中。For example, according to the index information of the physical address of the write-back request of the near storage body, a far storage body is corresponding, the far storage body is accessed and a search operation is performed through the tag information of the physical address, and the tag of the physical address is compared with the tag of each cache line in the far storage body. If a match is made, the cache hits, indicating that the second victim cache line already exists in the far storage body, so only the state stored in the far storage body needs to be updated. In the case of a miss, a replacement algorithm needs to be used to determine whether a third victim cache line needs to be generated in the corresponding far storage body and written back to the memory.
例如,在本公开的一些实施例中,根据替换算法判断是否将在对应的远存储体中选择一个第三牺牲缓存行写回到内存中可以包括:在替换算法显示对应的远存储体中存在空闲的缓存行可供使用的情况下,将第二牺牲缓存行存储到对应的远存储体中;在替换算法显示对应的远存储体中没有空闲的缓存行可供使用的情况下,将第二牺牲缓存行或者第三牺牲缓存行写回到内存中。For example, in some embodiments of the present disclosure, determining whether to select a third sacrifice cache line in the corresponding far storage body and write it back to the memory according to the replacement algorithm may include: if the replacement algorithm shows that there are free cache lines available in the corresponding far storage body, storing the second sacrifice cache line in the corresponding far storage body; if the replacement algorithm shows that there are no free cache lines available in the corresponding far storage body, writing the second sacrifice cache line or the third sacrifice cache line back to the memory.
例如,在对应的远存储体中存在空闲的缓存行存储第二牺牲缓存行时,直接将第二牺牲缓存行设置在远存储体中。在对应的远存储体中没有空闲的缓存行存储第二牺牲缓存行时,则在远存储体中产生一个第三牺牲缓存行,并判断是将第二牺牲缓存行写入内存还是将第三牺牲缓存行写入内存。判断将哪个牺牲缓存行写入内存可以采用如下三种方式,本公开的实施例不限于以下三种方式。For example, when there is an idle cache line in the corresponding far memory bank to store the second sacrifice cache line, the second sacrifice cache line is directly set in the far memory bank. When there is no idle cache line in the corresponding far memory bank to store the second sacrifice cache line, a third sacrifice cache line is generated in the far memory bank, and it is determined whether to write the second sacrifice cache line or the third sacrifice cache line into the memory. The following three methods can be used to determine which sacrifice cache line is written into the memory, and the embodiments of the present disclosure are not limited to the following three methods.
方式一:对于LRU替换算法,比较第二牺牲缓存行的老化值与第三牺牲缓存行的老化值,优先选择将第二牺牲缓存行和第三牺牲缓存行中老化值较
大的缓存行写回到内存中,如果第二牺牲缓存行的老化值与第三牺牲缓存行的老化值相等,则将第二牺牲缓存行写回到内存中。Method 1: For the LRU replacement algorithm, the aging value of the second victim cache line is compared with the aging value of the third victim cache line, and the second victim cache line and the third victim cache line with the smaller aging value are preferentially selected. The large cache line is written back to the memory, and if the aging value of the second victim cache line is equal to the aging value of the third victim cache line, the second victim cache line is written back to the memory.
方式二:通过寄存器配置优先写回第二牺牲缓存行或者优先写回第三牺牲缓存行。Method 2: Prioritize writing back the second victim cache line or the third victim cache line through register configuration.
方式三:查看共享缓存的目录,在其它处理器核对应的近存储体中存在第二牺牲缓存行的情况下,将其它处理核对应的近存储体中的第二牺牲缓存行写回到内存中。Method three: Check the directory of the shared cache, and if the second victim cache line exists in the near memory bank corresponding to the other processor core, write the second victim cache line in the near memory bank corresponding to the other processor core back to the memory.
图8A示出了一种针对读请求的缓存管理方法的流程示意图。FIG8A is a schematic flow chart showing a cache management method for a read request.
如图8A所示,首先处理器核发出读请求,根据读请求的物理地址优先访问对应的近存储体,判断读请求是否在对应的近存储体中命中。如果命中,无效对应的近存储体中的副本,将近存储体中的数据返回给处理器核,然后更新共享缓存的目录。如果未命中,则将access_farBank_flag设置为1,将读请求发送到对应的远存储体中。接着判断读请求是否在对应的远存储体中命中,如果命中,则保留远存储体中的副本,增加缓存行在远存储体中存储的老化(age)信息,将对应的远存储体中的数据返回给处理器核,然后更新共享缓存的目录。如果未命中,通过查看共享缓存的目录判断读请求是否在其它处理器核中命中。如果命中,则通过处理器核到处理器核的转移方式将数据返回给发出读请求的处理器核,然后更新共享缓存中的目录。如果未命中,则将读请求发送到内存(系统存储器的示例)中以获取所请求的数据。As shown in FIG8A , first, the processor core issues a read request, and preferentially accesses the corresponding near storage body according to the physical address of the read request, and determines whether the read request hits in the corresponding near storage body. If it hits, the copy in the corresponding near storage body is invalidated, and the data in the near storage body is returned to the processor core, and then the directory of the shared cache is updated. If it does not hit, access_farBank_flag is set to 1, and the read request is sent to the corresponding far storage body. Then it is determined whether the read request hits in the corresponding far storage body. If it hits, the copy in the far storage body is retained, the aging (age) information of the cache line stored in the far storage body is increased, and the data in the corresponding far storage body is returned to the processor core, and then the directory of the shared cache is updated. If it does not hit, it is determined whether the read request hits in other processor cores by checking the directory of the shared cache. If it hits, the data is returned to the processor core that issued the read request through the transfer mode from processor core to processor core, and then the directory in the shared cache is updated. If it does not hit, the read request is sent to the memory (an example of the system memory) to obtain the requested data.
图8B示出了一种针对写回请求的缓存管理方法的流程示意图。FIG8B shows a schematic flow chart of a cache management method for write-back requests.
如图8B所示,首先处理器核发出写回请求到对应的近存储体,判断写回请求是否在对应的近存储体中命中。如果命中,则更新共享缓存的目录,如果未命中,则通过替换算法将第一牺牲缓存行设置在对应的近存储体中,判断是否生成第二牺牲缓存行。如果在对应的近存储体中存在空闲的缓存行存储第一牺牲缓存行,则不生成第二牺牲缓存行,更新共享缓存的目录。如果在对应的近存储体中没有空闲的缓存行存储第一牺牲缓存行,则生成一个第二牺牲缓存行,然后将第二牺牲缓存行迁移到对应的远存储体中,将写回请求路由到对应的远存储体中。然后判断第二牺牲缓存行是否在对应的远存储体中命中,如果命中,完成写回请求,如果未命中,根据替换算法判断是否将在对应的远存储体中生成一个第三牺牲缓存行。如果对应的远存储体中
存在空闲的缓存行可供使用,则不生成第三牺牲缓存行,判断是否将第二牺牲缓存行写回到内存中,如果是,则将第二牺牲缓存行写回到内存中,如果否,则将第二牺牲缓存行设置在对应的远存储体中。如果对应的远存储体中没有空闲的缓存行可供使用,则生成第三牺牲缓存行,将第二牺牲缓存行设置在对应的远存储体中,并将第三牺牲缓存行写回到内存中。As shown in FIG8B , first, the processor core sends a write-back request to the corresponding near storage body to determine whether the write-back request hits in the corresponding near storage body. If it hits, the directory of the shared cache is updated. If it misses, the first sacrifice cache line is set in the corresponding near storage body through the replacement algorithm, and it is determined whether to generate a second sacrifice cache line. If there is an idle cache line in the corresponding near storage body to store the first sacrifice cache line, the second sacrifice cache line is not generated, and the directory of the shared cache is updated. If there is no idle cache line in the corresponding near storage body to store the first sacrifice cache line, a second sacrifice cache line is generated, and then the second sacrifice cache line is migrated to the corresponding far storage body, and the write-back request is routed to the corresponding far storage body. Then it is determined whether the second sacrifice cache line hits in the corresponding far storage body. If it hits, the write-back request is completed. If it misses, it is determined whether a third sacrifice cache line will be generated in the corresponding far storage body according to the replacement algorithm. If in the corresponding far storage body If there is an idle cache line available for use, the third victim cache line is not generated, and it is determined whether the second victim cache line is written back to the memory. If yes, the second victim cache line is written back to the memory, and if no, the second victim cache line is set in the corresponding far memory bank. If there is no idle cache line available for use in the corresponding far memory bank, the third victim cache line is generated, the second victim cache line is set in the corresponding far memory bank, and the third victim cache line is written back to the memory.
图8C示出了一种缓存行迁移的示例的示意框图。FIG. 8C is a schematic block diagram showing an example of cache line migration.
如图8C所示,初始状态下,存在4个处理器核(Core0、Core1、Core2、Core3),Core3中有副本D,共享缓存包括4个存储体(存储体0、存储体1、存储体2、存储体3),存储体0中有副本A,存储体1中有副本B,存储体2中有副本C,存储体3中有副本F。存储体0是Core0的近存储体,是Core1/2/3的远存储体;存储体1是Core1的近存储体,是Core0/2/3的远存储体;存储体2是Core2的近存储体,是Core0/1/3的远存储体;存储体3是Core3的近存储体,是Core0/1/2的远存储体。As shown in FIG8C , in the initial state, there are 4 processor cores (Core0, Core1, Core2, Core3), Core3 has a copy D, and the shared cache includes 4 memory banks (memory bank 0, memory bank 1, memory bank 2, memory bank 3), memory bank 0 has a copy A, memory bank 1 has a copy B, memory bank 2 has a copy C, and memory bank 3 has a copy F. Memory bank 0 is the near memory bank of Core0 and the far memory bank of Core1/2/3; memory bank 1 is the near memory bank of Core1 and the far memory bank of Core0/2/3; memory bank 2 is the near memory bank of Core2 and the far memory bank of Core0/1/3; memory bank 3 is the near memory bank of Core3 and the far memory bank of Core0/1/2.
首先,Core0从对应的近存储体(存储体0)中读取数据,Core1从对应的远存储体(存储体2)中读取数据,Core2从其它处理器核(Core3)中读取数据,Core3从内存中读取数据。经过上述读取操作后,存储体0中的副本A迁移到Core0中,存储体2中的副本C迁移到Core1中且存储体2保留副本C,Core3中的副本D迁移到Core2中且Core3保留副本D,内存中的副本E迁移到Core3中。然后,Core0、Core1、Core3分别生成写回请求,经过写回操作后,Core0中的牺牲缓存行(副本A)写回到存储体0中。Core1中的牺牲缓存行(副本C)写回到存储体1中,存储体1中的牺牲缓存行(副本B)迁移到Core1对应的远存储体(存储体2)中,存储体2中的牺牲缓存行(副本C)写回到内存中。Core3对应的近存储体(存储体3)中的牺牲缓存行F迁移到对应的远存储体(存储体0)中,牺牲缓存行F从存储体0写回到内存中。First, Core0 reads data from the corresponding near memory bank (memory bank 0), Core1 reads data from the corresponding far memory bank (memory bank 2), Core2 reads data from other processor cores (Core3), and Core3 reads data from the memory. After the above read operation, copy A in memory bank 0 is migrated to Core0, copy C in memory bank 2 is migrated to Core1 and memory bank 2 retains copy C, copy D in Core3 is migrated to Core2 and Core3 retains copy D, and copy E in memory is migrated to Core3. Then, Core0, Core1, and Core3 generate write-back requests respectively. After the write-back operation, the victim cache line (copy A) in Core0 is written back to memory bank 0. The victim cache line (copy C) in Core1 is written back to memory bank 1, the victim cache line (copy B) in memory bank 1 is migrated to the far memory bank (memory bank 2) corresponding to Core1, and the victim cache line (copy C) in memory bank 2 is written back to the memory. The victim cache line F in the near memory bank (memory bank 3) corresponding to Core3 is migrated to the corresponding far memory bank (memory bank 0), and the victim cache line F is written back to the memory from memory bank 0.
图9A示出了本公开至少一实施例提供的一种缓存装置900的示意框图,该缓存装置可以用于执行图4所示的缓存管理方法。FIG. 9A shows a schematic block diagram of a cache device 900 provided by at least one embodiment of the present disclosure. The cache device can be used to execute the cache management method shown in FIG. 4 .
如图9A所示,缓存装置900包括用于多个处理器核共享的共享缓存901和缓存管理单元902,缓存管理单元902包括近存储体接收组件903、远存储体接收组件904、近存储体流水控制组件905、远存储体流水控制组件906、
近存储体返回结果组件907和远存储体返回结果组件908。这里,共享缓存901包括多个存储体。As shown in FIG9A , the cache device 900 includes a shared cache 901 for multiple processor cores to share and a cache management unit 902. The cache management unit 902 includes a near storage receiving component 903, a far storage receiving component 904, a near storage pipeline control component 905, a far storage pipeline control component 906, The near bank returns the result component 907 and the far bank returns the result component 908. Here, the shared cache 901 includes a plurality of banks.
缓存管理单元902被配置为对每个处理器核分配各自的近存储体和远存储体,且使得对每个处理器核的访存请求优先访问对应的近存储体,再访问对应的远存储体。The cache management unit 902 is configured to allocate a respective near memory bank and far memory bank to each processor core, and to make the memory access request to each processor core access the corresponding near memory bank first, and then access the corresponding far memory bank.
例如,处理器核对于对应的近存储体的访问延迟小于对于对应的远存储体的访问延迟。For example, the access latency of the processor core to the corresponding near memory bank is shorter than the access latency to the corresponding far memory bank.
近存储体接收组件903被配置为接收发送到对应的近存储体的访存请求。The near storage receiving component 903 is configured to receive a memory access request sent to a corresponding near storage.
远存储体接收组件904被配置为接收发送到对应的远存储体的访存请求。The far memory bank receiving component 904 is configured to receive a memory access request sent to a corresponding far memory bank.
近存储体流水控制组件905被配置为判断对应的近存储体接收到的访存请求的处理方式和是否在对应的近存储体中命中,以及执行对于对应的近存储体的替换算法。The near bank pipeline control component 905 is configured to determine the processing mode of the memory access request received by the corresponding near bank and whether it hits in the corresponding near bank, and execute a replacement algorithm for the corresponding near bank.
远存储体流水控制组件906被配置为判断对应的远存储体接收到的访存请求的处理方式和是否在对应的远存储体中命中,以及执行对于对应的远存储体的替换算法。The far memory bank pipeline control component 906 is configured to determine the processing mode of the memory access request received by the corresponding far memory bank and whether it hits in the corresponding far memory bank, and execute a replacement algorithm for the corresponding far memory bank.
近存储体返回结果组件907被配置为将处理器核需要的结果返回给处理器核。The near storage return result component 907 is configured to return the result required by the processor core to the processor core.
远存储体返回结果组件908被配置为将处理器核需要的结果返回给处理器核。The far storage return result component 908 is configured to return the result required by the processor core to the processor core.
该缓存装置900和图4所示的缓存管理方法的技术效果相同,在此不再赘述。The cache device 900 has the same technical effects as the cache management method shown in FIG. 4 , which will not be described in detail herein.
图9B示出了本公开至少一实施例提供的一种缓存装置910的结构示意图。FIG. 9B shows a schematic structural diagram of a cache device 910 provided by at least one embodiment of the present disclosure.
如图9B所示,近存储体接收组件911接收处理器发送到近存储体的访存请求,近存储体接收组件911将访存请求发送到近存储体流水控制组件912,近存储体流水控制组件912和近存储体存储组件913、近存储体返回结果组件914、远存储体接收组件915连接。近存储体结果返回组件914负责将结果返回给处理器。远存储体接收组件915从近存储体流水控制组件912接收访存请求,并将访存请求发送到远存储体流水控制组件916。远存储体流水控制组件916和远存储体存储组件917、远存储体返回结果组件918、内存
919连接。远存储体返回结果组件918可以从内存919读取数据,其负责将结果返回给处理器。As shown in FIG9B , the near storage receiving component 911 receives the memory access request sent by the processor to the near storage, and the near storage receiving component 911 sends the memory access request to the near storage pipeline control component 912. The near storage pipeline control component 912 is connected to the near storage storage component 913, the near storage return result component 914, and the far storage receiving component 915. The near storage result return component 914 is responsible for returning the result to the processor. The far storage receiving component 915 receives the memory access request from the near storage pipeline control component 912, and sends the memory access request to the far storage pipeline control component 916. The far storage pipeline control component 916 is connected to the far storage storage component 917, the far storage return result component 918, and the memory 919 connection. The far storage return result component 918 can read data from the memory 919 and is responsible for returning the result to the processor.
需要说明的是,近存储体接收组件和远存储体接收组件可以采用队列(queue)或FIFO(First In First Out,先入先出)队列等硬件实现方式,本公开对此不作限制。近存储体存储组件以及远存储体存储组件用于存储缓存行信息,其可以是静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)等形式,本公开对此不作限制。内存可以是片上存储,也可以是片外存储,本公开对此不作限制。It should be noted that the near memory receiving component and the far memory receiving component can be implemented in hardware such as a queue or a FIFO (First In First Out) queue, and the present disclosure does not limit this. The near memory storage component and the far memory storage component are used to store cache line information, which can be in the form of a static random access memory (SRAM), a dynamic random access memory (DRAM), etc., and the present disclosure does not limit this. The memory can be on-chip storage or off-chip storage, and the present disclosure does not limit this.
例如,缓存装置可以采用硬件、软件、固件以及它们的任意可行的组合实现,本公开对此不作限制。For example, the cache device may be implemented using hardware, software, firmware, or any feasible combination thereof, and the present disclosure is not limited thereto.
本公开至少一实施例还提供了一种缓存管理装置,包括:存储器,用于非暂时性存储计算机可执行指令;以及处理器,用于运行计算机可执行指令,其中,计算机可执行指令被处理器运行时执行本公开至少一实施例提供的缓存管理方法。At least one embodiment of the present disclosure further provides a cache management device, comprising: a memory for non-temporarily storing computer executable instructions; and a processor for executing the computer executable instructions, wherein the computer executable instructions, when executed by the processor, execute the cache management method provided by at least one embodiment of the present disclosure.
图10示出了根据本公开实施例的缓存管理装置1000的示意图。如图10所示,根据本公开实施例的缓存管理装置1000可以包括处理装置1001以及存储器1002,其可以通过总线1003进行互连。FIG10 shows a schematic diagram of a cache management device 1000 according to an embodiment of the present disclosure. As shown in FIG10 , the cache management device 1000 according to an embodiment of the present disclosure may include a processing device 1001 and a memory 1002 , which may be interconnected via a bus 1003 .
处理装置1001可以根据存储在存储器1002中的程序或代码执行各种动作和处理。具体地,处理装置1001可以是一种集成电路芯片,具有信号的处理能力。例如,上述处理装置可以是通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现成可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本公开实施例中公开的各种方法、步骤、流程及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等,可以是X86架构或者是ARM架构等。The processing device 1001 can perform various actions and processes according to the program or code stored in the memory 1002. Specifically, the processing device 1001 can be an integrated circuit chip with signal processing capabilities. For example, the above-mentioned processing device can be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components. The various methods, steps, processes and logic block diagrams disclosed in the embodiments of the present disclosure can be implemented or executed. The general-purpose processor can be a microprocessor or the processor can also be any conventional processor, etc., which can be an X86 architecture or an ARM architecture, etc.
存储器1002存储有计算机可执行指令,其中,计算机可执行指令在被处理装置1001执行时实现本公开至少一实施例提供的缓存管理方法。存储器1002可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。非易失性存储器可以是只读存储器(ROM)、可编程只读存储器(PROM)、可擦除可编程只读存储器(EPROM)、电可擦除可编程只读存储
器(EEPROM)或闪存。易失性存储器可以是随机存取存储器(RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、同步动态随机存取存储器(SDRAM)、双倍数据速率同步动态随机存取存储器(DDRSDRAM)、增强型同步动态随机存取存储器(ESDRAM)、同步连接动态随机存取存储器(SLDRAM)和直接主存总线随机存取存储器(DRRAM)。应注意,本文描述的方法的存储器旨在包括但不限于这些和任意其它适合类型的存储器。The memory 1002 stores computer executable instructions, wherein the computer executable instructions implement the cache management method provided by at least one embodiment of the present disclosure when executed by the processing device 1001. The memory 1002 may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memories. The non-volatile memory may be a read-only memory (ROM), a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEROM), or an electrically erasable programmable read-only memory (EEROM). The volatile memory may be a random access memory (RAM) that is used as an external cache. By way of example but not limitation, many forms of RAM are available, such as static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (SDRAM), double data rate synchronous dynamic random access memory (DDRSDRAM), enhanced synchronous dynamic random access memory (ESDRAM), synchronous linked dynamic random access memory (SLDRAM), and direct main memory bus random access memory (DRRAM). It should be noted that the memory of the methods described herein is intended to include, but is not limited to, these and any other suitable types of memory.
本公开的至少一个实施例还提供一种电子装置,包括缓存和本公开至少一个实施例提供的缓存装置以及多个处理器核。在一个实施例中,该电子装置例如是中央处理器,该处理器例如是单核或多核处理器。在一个实施例中,该电子装置为计算机系统,该计算机系统包括一个或多个处理器,At least one embodiment of the present disclosure further provides an electronic device, including a cache and a cache device provided by at least one embodiment of the present disclosure and multiple processor cores. In one embodiment, the electronic device is, for example, a central processing unit, and the processor is, for example, a single-core or multi-core processor. In one embodiment, the electronic device is a computer system, and the computer system includes one or more processors,
图11示出了根据本公开实施例的电子装置1100的示意图。如图11所示,根据本公开实施例的电子装置1100可以包括缓存装置900、缓存1101以及多个Core1102。Fig. 11 shows a schematic diagram of an electronic device 1100 according to an embodiment of the present disclosure. As shown in Fig. 11 , the electronic device 1100 according to an embodiment of the present disclosure may include a cache device 900 , a cache 1101 , and a plurality of Cores 1102 .
本公开至少一实施例提供一种计算机可读存储介质,用于非瞬时性地存储计算机可执行指令,该计算机可执行指令在被处理器执行时实现本公开至少一实施例提供的缓存管理方法。At least one embodiment of the present disclosure provides a computer-readable storage medium for non-transitory storage of computer-executable instructions, which implement the cache management method provided by at least one embodiment of the present disclosure when executed by a processor.
类似地,本公开实施例中的计算机可读存储介质可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。应注意,本文描述的方法的存储器旨在包括但不限于这些和任意其它适合类型的存储器。Similarly, the computer-readable storage medium in the embodiments of the present disclosure may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory. It should be noted that the memory of the methods described herein is intended to include, but is not limited to, these and any other suitable types of memory.
本公开的实施例还提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行根据本公开实施例的缓存管理方法。The embodiment of the present disclosure also provides a computer program product or a computer program, which includes a computer instruction, and the computer instruction is stored in a computer-readable storage medium. The processor of the computer device reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the computer device executes the cache management method according to the embodiment of the present disclosure.
上述缓存装置、缓存管理装置、电子装置以及存储介质的技术效果与图4所示的缓存管理方法的技术效果相同,在此不再赘述。The technical effects of the above-mentioned cache device, cache management device, electronic device and storage medium are the same as the technical effects of the cache management method shown in Figure 4, and will not be repeated here.
有以下几点需要说明:There are a few points to note:
(1)本公开实施例附图只涉及到本公开实施例涉及到的结构,其他结构
可参考通常设计。(1) The drawings of the embodiments of the present disclosure only relate to the structures of the embodiments of the present disclosure. Other structures Please refer to the usual design.
(2)在不冲突的情况下,本公开的实施例及实施例中的特征可以相互组合以得到新的实施例。(2) In the absence of conflict, the embodiments of the present disclosure and the features therein may be combined with each other to obtain new embodiments.
以上所述,仅为本公开的具体实施方式,但本公开的保护范围并不局限于此,本公开的保护范围应以所述权利要求的保护范围为准。
The above description is only a specific implementation of the present disclosure, but the protection scope of the present disclosure is not limited thereto. The protection scope of the present disclosure shall be based on the protection scope of the claims.
Claims (21)
- 一种缓存管理方法,用于多个处理器核共享的共享缓存,所述缓存管理方法包括:A cache management method is used for a shared cache shared by multiple processor cores, the cache management method comprising:对每个所述处理器核分配各自的近存储体和远存储体;Allocating a respective near memory bank and a far memory bank to each of the processor cores;对每个所述处理器核的访存请求,优先访问所述对应的近存储体,再访问所述对应的远存储体。For each memory access request of the processor core, the corresponding near memory bank is accessed first, and then the corresponding far memory bank is accessed.
- 根据权利要求1所述的缓存管理方法,其中,对每个所述处理器核优先访问所述对应的近存储体,再访问所述对应的远存储体,包括:The cache management method according to claim 1, wherein for each of the processor cores, preferentially accessing the corresponding near memory bank and then accessing the corresponding far memory bank comprises:根据所述访存请求中的读请求的物理地址对所述对应的近存储体进行操作;Performing an operation on the corresponding near memory bank according to the physical address of the read request in the memory access request;在所述读请求在所述近存储体中未命中的情况下,根据所述读请求的物理地址将所述读请求路由到所述对应的远存储体,并对所述对应的远存储体进行操作。In the case that the read request does not hit the near memory bank, the read request is routed to the corresponding far memory bank according to the physical address of the read request, and the corresponding far memory bank is operated.
- 根据权利要求2所述的缓存管理方法,还包括:The cache management method according to claim 2, further comprising:在所述读请求在所述对应的近存储体和所述对应的远存储体中均未命中的情况下,通过查看所述共享缓存的目录对其它处理器核进行操作。In the case that the read request does not hit in both the corresponding near memory bank and the corresponding far memory bank, operations are performed on other processor cores by checking the directory of the shared cache.
- 根据权利要求2所述的缓存管理方法,其中,根据所述访存请求中的读请求的物理地址对所述对应的近存储体进行操作,包括:The cache management method according to claim 2, wherein the operation on the corresponding near memory bank according to the physical address of the read request in the memory access request comprises:在所述读请求在所述对应的近存储体中命中的情况下,将所述近存储体中的数据返回给所述处理器核。When the read request hits the corresponding near memory bank, the data in the near memory bank is returned to the processor core.
- 根据权利要求2所述的缓存管理方法,其中,根据所述访存请求中的读请求的物理地址对所述对应的近存储体进行操作,包括:The cache management method according to claim 2, wherein the operating the corresponding near memory bank according to the physical address of the read request in the memory access request comprises:在所述读请求在所述对应的近存储体中命中的情况下,无效所述对应的近存储体中的副本,更新所述共享缓存的目录。In the case where the read request hits the corresponding near storage bank, the copy in the corresponding near storage bank is invalidated, and the directory of the shared cache is updated.
- 根据权利要求2、4、5任一项所述的缓存管理方法,其中,对所述对应的远存储体进行操作,包括:The cache management method according to any one of claims 2, 4, and 5, wherein the operation on the corresponding far memory bank comprises:在所述读请求在所述对应的远存储体中命中的情况下,将所述对应的远存储体中的数据返回给所述处理器核。When the read request hits the corresponding far memory bank, the data in the corresponding far memory bank is returned to the processor core.
- 根据权利要求6所述的缓存管理方法,其中,对所述对应的远存储体 进行操作,还包括:The cache management method according to claim 6, wherein the corresponding far memory bank Operations also include:保留所述远存储体中的副本,增加所述数据对应的缓存行在所述对应的远存储体中存储的老化信息,更新所述共享缓存的目录。The copy in the far storage body is retained, aging information of the cache line corresponding to the data stored in the corresponding far storage body is added, and the directory of the shared cache is updated.
- 根据权利要求3所述的缓存管理方法,其中,通过查看所述共享缓存的目录对其它处理器核进行操作,包括:The cache management method according to claim 3, wherein operating other processor cores by viewing the directory of the shared cache comprises:在所述其它处理器核中存在需要查找的数据的情况下,将所述其它处理器核中的数据返回给发出所述读请求的处理器核,更新所述共享缓存中的目录;In the case that the data to be searched exists in the other processor cores, the data in the other processor cores are returned to the processor core that issued the read request, and the directory in the shared cache is updated;在所述其它处理器核中不存在所述需要查找的数据的情况下,将所述读请求发送到内存以获取所述需要查找的数据。In the case that the data to be found does not exist in the other processor cores, the read request is sent to the memory to obtain the data to be found.
- 根据权利要求1所述的缓存管理方法,其中,对每个所述处理器核优先访问所述对应的近存储体,再访问所述对应的远存储体,包括:The cache management method according to claim 1, wherein for each of the processor cores, preferentially accessing the corresponding near memory bank and then accessing the corresponding far memory bank comprises:根据所述访存请求中的写回请求的物理地址对所述对应的近存储体进行操作;Performing an operation on the corresponding near memory bank according to the physical address of the write-back request in the memory access request;在所述对应的近存储体产生缓存行的迁移操作的情况下,根据所述对应的近存储体的第一牺牲缓存行的地址信息对所述对应的远存储体进行操作。In the case where the corresponding near memory bank generates a cache line migration operation, the corresponding far memory bank is operated according to the address information of the first victim cache line of the corresponding near memory bank.
- 根据权利要求9所述的缓存管理方法,其中,根据所述访存请求中的写回请求的物理地址对所述对应的近存储体进行操作,包括:The cache management method according to claim 9, wherein the step of operating the corresponding near memory bank according to the physical address of the write-back request in the memory access request comprises:在所述写回请求在所述对应的近存储体中命中的情况下,更新所述对应的近存储体中存储的状态;When the write-back request hits the corresponding near memory bank, updating the state stored in the corresponding near memory bank;在所述写回请求在所述对应的近存储体中未命中的情况下,通过替换算法将所述第一牺牲缓存行设置在所述对应的近存储体中。In the case where the write-back request does not hit in the corresponding near memory bank, the first victim cache line is set in the corresponding near memory bank through a replacement algorithm.
- 根据权利要求10所述的缓存管理方法,其中,通过替换算法将所述第一牺牲缓存行存储在所述对应的近存储体中,包括:The cache management method according to claim 10, wherein storing the first victim cache line in the corresponding near memory bank by a replacement algorithm comprises:在所述对应的近存储体中存在空闲的缓存行存储所述第一牺牲缓存行的情况下,将所述第一牺牲缓存行存储在所述对应的近存储体中,更新所述共享缓存的目录;In the case where there is an idle cache line in the corresponding near storage bank for storing the first victim cache line, storing the first victim cache line in the corresponding near storage bank, and updating the directory of the shared cache;在所述对应的近存储体中没有空闲的缓存行存储所述第一牺牲缓存行的情况下,所述对应的近存储体产生一个第二牺牲缓存行,对所述第二牺牲缓存行进行迁移操作,标记指示写回远存储体的标志,将所述第二牺牲缓存 行和所述第二牺牲缓存行对应的老化信息一起发送给所述对应的远存储体,更新所述共享缓存的目录。When there is no free cache line in the corresponding near storage body to store the first victim cache line, the corresponding near storage body generates a second victim cache line, performs a migration operation on the second victim cache line, marks a flag indicating writing back to the far storage body, and stores the second victim cache line in the corresponding near storage body. The line and the aging information corresponding to the second victim cache line are sent to the corresponding far memory bank to update the directory of the shared cache.
- 根据权利要求11所述的缓存管理方法,其中,根据所述对应的近存储体的所述第一牺牲缓存行的地址信息对所述对应的远存储体进行操作,包括:The cache management method according to claim 11, wherein the operation on the corresponding far memory bank according to the address information of the first victim cache line of the corresponding near memory bank comprises:在所述写回请求在所述对应的远存储体中命中的情况下,更新所述对应的远存储体中的缓存行的状态;When the write-back request hits in the corresponding far memory bank, updating the state of the cache line in the corresponding far memory bank;在所述写回请求在所述对应的远存储体中未命中的情况下,根据替换算法判断是否将在所述对应的远存储体中选择一个第三牺牲缓存行写回到内存中。In the case that the write-back request does not hit in the corresponding far memory bank, it is determined according to a replacement algorithm whether to select a third victim cache line in the corresponding far memory bank and write it back to the memory.
- 根据权利要求12所述的缓存管理方法,其中,根据替换算法判断是否将在所述对应的远存储体中选择一个第三牺牲缓存行写回到内存中,包括:The cache management method according to claim 12, wherein determining whether to select a third victim cache line in the corresponding far memory bank and write it back to the memory according to the replacement algorithm comprises:在所述替换算法显示所述对应的远存储体中存在空闲的缓存行可供使用的情况下,将所述第二牺牲缓存行存储到所述对应的远存储体中;When the replacement algorithm shows that there is an idle cache line in the corresponding far memory bank, storing the second victim cache line in the corresponding far memory bank;在所述替换算法显示所述对应的远存储体中没有空闲的缓存行可供使用的情况下,将所述第二牺牲缓存行或者所述第三牺牲缓存行写回到所述内存中。When the replacement algorithm shows that there is no free cache line available in the corresponding far memory bank, the second victim cache line or the third victim cache line is written back to the memory.
- 根据权利要求13所述的缓存管理方法,其中,将所述第二牺牲缓存行或者所述第三牺牲缓存行写回到所述内存中,包括:The cache management method according to claim 13, wherein writing the second victim cache line or the third victim cache line back to the memory comprises:比较所述第二牺牲缓存行的老化值与所述第三牺牲缓存行的老化值;comparing the aging value of the second victim cache line with the aging value of the third victim cache line;将所述第二牺牲缓存行和所述第三牺牲缓存行中老化值较大的缓存行写回到所述内存中;或者writing the cache line with a larger aging value among the second victim cache line and the third victim cache line back to the memory; or在所述第二牺牲缓存行的老化值与所述第三牺牲缓存行的老化值相等的情况下,将所述第二牺牲缓存行写回到所述内存中。When the aging value of the second victim cache line is equal to the aging value of the third victim cache line, the second victim cache line is written back to the memory.
- 根据权利要求13所述的缓存管理方法,其中,将所述第二牺牲缓存行或者所述第三牺牲缓存行写回到所述内存中,包括:The cache management method according to claim 13, wherein writing the second victim cache line or the third victim cache line back to the memory comprises:通过寄存器配置优先写回所述第二牺牲缓存行或者优先写回所述第三牺牲缓存行。The second victim cache line is preferentially written back or the third victim cache line is preferentially written back through register configuration.
- 根据权利要求13所述的缓存管理方法,其中,将所述第二牺牲缓存行或者所述第三牺牲缓存行写回到所述内存中,包括: The cache management method according to claim 13, wherein writing the second victim cache line or the third victim cache line back to the memory comprises:查看所述共享缓存的目录,在其它处理器核对应的近存储体中存在所述第二牺牲缓存行的情况下,将所述其它处理核对应的近存储体中的所述第二牺牲缓存行写回到所述内存中。Check the directory of the shared cache, and if the second victim cache line exists in the near storage bank corresponding to the other processor core, write the second victim cache line in the near storage bank corresponding to the other processor core back to the memory.
- 一种缓存装置,包括:A cache device, comprising:用于多个处理器核共享的共享缓存,所述共享缓存包括多个存储体,A shared cache for multiple processor cores to share, the shared cache comprising multiple memory banks,缓存管理单元,配置为对每个处理器核分配各自的近存储体和远存储体,且使得对每个处理器核的访存请求优先访问对应的近存储体,再访问对应的远存储体。The cache management unit is configured to allocate a respective near memory bank and far memory bank to each processor core, and to make the memory access request to each processor core access the corresponding near memory bank first, and then access the corresponding far memory bank.
- 根据权利要求17所述的缓存装置,其中,所述缓存管理单元包括:The cache device according to claim 17, wherein the cache management unit comprises:近存储体接收组件,配置为接收发送到所述对应的近存储体的访存请求;A near storage receiving component configured to receive a memory access request sent to the corresponding near storage;远存储体接收组件,配置为接收发送到所述对应的远存储体的访存请求;A remote storage receiving component configured to receive a memory access request sent to the corresponding remote storage;近存储体流水控制组件,配置为判断所述对应的近存储体接收到的访存请求的处理方式和是否在所述对应的近存储体中命中,以及执行对于所述对应的近存储体的替换算法;A near memory bank pipeline control component configured to determine a processing mode of a memory access request received by the corresponding near memory bank and whether the memory access request is hit in the corresponding near memory bank, and to execute a replacement algorithm for the corresponding near memory bank;远存储体流水控制组件,配置为判断所述对应的远存储体接收到的访存请求的处理方式和是否在所述对应的远存储体中命中,以及执行对于所述对应的远存储体的替换算法;A far memory bank pipeline control component configured to determine a processing mode of a memory access request received by the corresponding far memory bank and whether the memory access request is hit in the corresponding far memory bank, and to execute a replacement algorithm for the corresponding far memory bank;近存储体返回结果组件,将所述处理器核需要的结果返回给所述处理器核;A near storage body return result component returns the result required by the processor core to the processor core;远存储体返回结果组件,将所述处理器核需要的结果返回给所述处理器核。The far storage returns the result component, and returns the result required by the processor core to the processor core.
- 一种缓存管理装置,包括:A cache management device, comprising:处理器;以及Processor; and存储器,存储有计算机可执行指令,a memory storing computer executable instructions,其中,所述计算机可执行指令在被所述处理器执行时实现根据权利要求1-16中任一项所述的缓存管理方法。Wherein, when the computer executable instructions are executed by the processor, the cache management method according to any one of claims 1-16 is implemented.
- 一种电子装置,包括缓存和权利要求17所述的缓存装置以及多个处理器核。An electronic device comprises a cache, the cache device as claimed in claim 17, and a plurality of processor cores.
- 一种计算机可读存储介质,用于非瞬时性地存储计算机可执行指令,A computer-readable storage medium for non-transitory storage of computer-executable instructions,其中,所述计算机可执行指令在被处理器执行时实现根据权利要求1-16 中任一项所述的缓存管理方法。 Wherein, the computer executable instructions, when executed by the processor, implement the following claims 1-16 The cache management method described in any one of the above.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211183443.X | 2022-09-27 | ||
CN202211183443.XA CN115617709A (en) | 2022-09-27 | 2022-09-27 | Cache management method and device, cache device, electronic device and medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024066195A1 true WO2024066195A1 (en) | 2024-04-04 |
Family
ID=84859739
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2023/078664 WO2024066195A1 (en) | 2022-09-27 | 2023-02-28 | Cache management method and apparatus, cache apparatus, electronic apparatus, and medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN115617709A (en) |
WO (1) | WO2024066195A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118626409A (en) * | 2024-08-12 | 2024-09-10 | 北京微核芯科技有限公司 | Method, device, equipment and medium for processing write request |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115617709A (en) * | 2022-09-27 | 2023-01-17 | 海光信息技术股份有限公司 | Cache management method and device, cache device, electronic device and medium |
CN117093511B (en) * | 2023-09-04 | 2024-05-10 | 海光云芯集成电路设计(上海)有限公司 | Access control method, access control device, chip and electronic equipment |
CN117851278B (en) * | 2024-03-08 | 2024-06-18 | 上海芯联芯智能科技有限公司 | Method for sharing static random access memory and central processing unit |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104699631A (en) * | 2015-03-26 | 2015-06-10 | 中国人民解放军国防科学技术大学 | Storage device and fetching method for multilayered cooperation and sharing in GPDSP (General-Purpose Digital Signal Processor) |
CN105095109A (en) * | 2014-05-21 | 2015-11-25 | 华为技术有限公司 | Cache access method, cache access router and computer system |
CN106663058A (en) * | 2014-06-24 | 2017-05-10 | 高通股份有限公司 | Disunited shared-information and private-information caches |
US20180189207A1 (en) * | 2011-09-30 | 2018-07-05 | Intel Corporation | Memory channel that supports near memory and far memory access |
CN115617709A (en) * | 2022-09-27 | 2023-01-17 | 海光信息技术股份有限公司 | Cache management method and device, cache device, electronic device and medium |
-
2022
- 2022-09-27 CN CN202211183443.XA patent/CN115617709A/en active Pending
-
2023
- 2023-02-28 WO PCT/CN2023/078664 patent/WO2024066195A1/en unknown
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180189207A1 (en) * | 2011-09-30 | 2018-07-05 | Intel Corporation | Memory channel that supports near memory and far memory access |
CN105095109A (en) * | 2014-05-21 | 2015-11-25 | 华为技术有限公司 | Cache access method, cache access router and computer system |
CN106663058A (en) * | 2014-06-24 | 2017-05-10 | 高通股份有限公司 | Disunited shared-information and private-information caches |
CN104699631A (en) * | 2015-03-26 | 2015-06-10 | 中国人民解放军国防科学技术大学 | Storage device and fetching method for multilayered cooperation and sharing in GPDSP (General-Purpose Digital Signal Processor) |
CN115617709A (en) * | 2022-09-27 | 2023-01-17 | 海光信息技术股份有限公司 | Cache management method and device, cache device, electronic device and medium |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118626409A (en) * | 2024-08-12 | 2024-09-10 | 北京微核芯科技有限公司 | Method, device, equipment and medium for processing write request |
Also Published As
Publication number | Publication date |
---|---|
CN115617709A (en) | 2023-01-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2024066195A1 (en) | Cache management method and apparatus, cache apparatus, electronic apparatus, and medium | |
US10019377B2 (en) | Managing cache coherence using information in a page table | |
US9792221B2 (en) | System and method for improving performance of read/write operations from a persistent memory device | |
US8185692B2 (en) | Unified cache structure that facilitates accessing translation table entries | |
US20100325374A1 (en) | Dynamically configuring memory interleaving for locality and performance isolation | |
US10078588B2 (en) | Using leases for entries in a translation lookaside buffer | |
JP2000250813A (en) | Data managing method for i/o cache memory | |
US12093186B2 (en) | Process dedicated in-memory translation lookaside buffers (TLBs) (mTLBs) for augmenting memory management unit (MMU) TLB for translating virtual addresses (VAs) to physical addresses (PAs) in a processor-based system | |
JPH08272693A (en) | Conversion table entry provided with cache possibility attribute bit regarding virtual address as well as method and apparatus for reference of said virtual address using said bit | |
US20160140042A1 (en) | Instruction cache translation management | |
JPWO2010035426A1 (en) | Buffer memory device, memory system, and data transfer method | |
US6560681B1 (en) | Split sparse directory for a distributed shared memory multiprocessor system | |
JP6027562B2 (en) | Cache memory system and processor system | |
US11126573B1 (en) | Systems and methods for managing variable size load units | |
US20110167223A1 (en) | Buffer memory device, memory system, and data reading method | |
US6311253B1 (en) | Methods for caching cache tags | |
US7743215B2 (en) | Cache-memory control apparatus, cache-memory control method and computer product | |
CN113138851A (en) | Cache management method and device | |
US10565111B2 (en) | Processor | |
JPH1091521A (en) | Duplex directory virtual cache and its control method | |
US11841800B2 (en) | Apparatus and method for handling stash requests | |
JP2685455B2 (en) | Data processing device | |
US20060015689A1 (en) | Implementation and management of moveable buffers in cache system | |
EP4116829B1 (en) | Systems and methods for managing variable size load units | |
US11669450B2 (en) | Computer including cache used in plural different data sizes and control method of computer |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23869458 Country of ref document: EP Kind code of ref document: A1 |