CN116795740A

CN116795740A - Data access method, device, processor, computer system and storage medium

Info

Publication number: CN116795740A
Application number: CN202310390378.6A
Authority: CN
Inventors: 张淼; 邹云晓; 王灿; 李寅
Original assignee: Pingtouge Shanghai Semiconductor Co Ltd
Current assignee: Pingtouge Shanghai Semiconductor Co Ltd
Priority date: 2023-04-12
Filing date: 2023-04-12
Publication date: 2023-09-22

Abstract

The application provides a data access method, a data access device, a processor, a computer system and a storage medium. The data access method comprises the following steps: searching whether a matched cache table entry exists in a translation look-up cache according to a virtual address specified by a translation request, wherein the virtual address corresponds to a first page size; when the matched cache table entry does not exist in the translation backup cache, a cache line is allocated, and the cache line corresponds to the first page size; acquiring an entry to be backfilled matched with the virtual address from a root page table, wherein the entry to be backfilled maps a second page size; and when the second page size is smaller than the first page size, setting the virtual address to correspond to the second page size, and searching the matched cache table entry in the translation look-up cache again. The preset page table can be the page size with the largest probability, so that the hit rate of the translation backup cache can be improved, the delay is reduced, and the flow pressure of the memory management unit is reduced.

Description

Data access method, device, processor, computer system and storage medium

Technical Field

The present application relates to the field of processors, and more particularly, to a data access method, apparatus, processor, computer system, and storage medium.

Background

In existing computer systems, the application programs of the processors basically call related resources by using the virtual addresses (Virtual Address VA) of the virtual memory, and manage the virtual storage space of the computer system by using the virtual addresses.

During memory access, virtual addresses need to be translated to physical addresses (Physical Address PA). In order to effect translation between virtual and physical addresses, a computer system needs to store a large number of entries, each for translating a specified range of virtual addresses to corresponding physical addresses.

A Translation Look-aside Buffer (TLB) is a cache that is used to cache a portion of entries stored in a computer system to avoid having to Look up from entries stored in the memory of the computer system every time an address Translation process occurs. If the virtual address to be translated matches one of the entries in the TLB cache (also referred to as a hit or hit), the memory management unit (Memory Management Unit MMU) can directly use the matching entry in the TLB to effect address translation without a lookup in a memory device external to the TLB. If none of the entries cached in the TLB match the virtual address to be translated, the mismatch (also referred to as a miss or miss) is generated.

If the TLB misses, translation of virtual and physical addresses is accomplished by accessing page table entries stored in the computer system memory as backfill entries through a page table walker (Page Table Walker PTW). The existing processor needs to support page tables with various page sizes (such as 4K, 64K, 2M, etc.), presetting TLB to be a page table with a cache small page size (4K) causes a lot of unnecessary mismatch, causes memory access delay, and also increases huge flow pressure for the memory management unit. The TLB is preset to be a page table with a large page size (64K), so that the mismatch rate of the TLB can be effectively reduced, and the hit rate is improved, but the address translation error rate is increased.

Therefore, how to improve the hit rate of the translation look-aside buffer, reduce the delay and alleviate the traffic pressure of the memory management unit is one of the problems to be solved.

Disclosure of Invention

The embodiment of the application provides a data access method, a device, a processor, a computer system and a storage medium, which enable a translation look-up buffer to be set to be the page size with the maximum probability by searching page table items in a TLB again, can improve the hit rate of the TLB, reduce self delay, simultaneously lighten the flow pressure of a memory management unit and improve the address translation efficiency.

The data access method according to an embodiment of the present application includes: searching whether a matched cache table entry exists in a translation look-up cache according to a virtual address specified by a translation request, wherein the virtual address corresponds to a first page size; when the matched cache table entry does not exist in the translation backup cache, a cache line is allocated, and the cache line corresponds to the first page size; acquiring an entry to be backfilled matched with the virtual address from a root page table, wherein the entry to be backfilled maps a second page size; and when the second page size is smaller than the first page size, setting the virtual address to correspond to the second page size, and searching the matched cache table entry in the translation look-up cache again.

In the above method, each cache line corresponds to a cache line tag, where the cache line tag has a size flag bit, and the size flag bit is used to indicate a page size corresponding to the cache line.

The method, wherein the method further comprises:

the size flag bit is modified to correspond to the second page size when the second page size is smaller than the first page size.

In the above method, each cache line corresponds to a cache line index, where the cache line index has a repeated lookup flag bit, where the repeated lookup flag bit is used to indicate whether a matched cache entry needs to be searched in the translation look-up buffer again.

The method, wherein the method further comprises:

the repeat lookup flag bit is set to active when the second page size is less than the first page size.

The method described above wherein when the number of translation requests exceeds the maximum capacity of the miss status holding register, no new translation requests are received.

The method, wherein the method further comprises:

and translating the virtual address into a physical address when the second page size is greater than or equal to the first page size.

The data access device according to an embodiment of the present application includes a memory management unit and a translation look-aside buffer, wherein the data access device is configured to: searching whether a matched cache table entry exists in a translation look-up cache according to a virtual address specified by a translation request received by the memory management unit, wherein the virtual address corresponds to a first page size; when the matched cache table entry does not exist in the translation backup cache, a cache line is allocated, and the cache line corresponds to the first page size; acquiring an entry to be backfilled matched with the virtual address from a root page table, wherein the entry to be backfilled maps a second page size; and when the second page size is smaller than the first page size, setting the virtual address to correspond to the second page size, and searching the matched cache table entry in the translation look-up cache again.

In the above apparatus, each cache line corresponds to a cache line tag, where the cache line tag has a size flag bit, and the size flag bit is used to indicate a page size corresponding to the cache line.

The apparatus as described above, wherein the size flag bit is modified to correspond to the second page size when the second page size is smaller than the first page size.

The above apparatus, wherein each cache line corresponds to a cache line index, and the cache line index has a repeated lookup flag bit, where the repeated lookup flag bit is used to indicate whether a matched cache entry needs to be searched in the translation look-up buffer again.

The above apparatus, wherein the repeat search flag bit is set to be valid when the second page size is smaller than the first page size.

The apparatus as described above, wherein when the number of translation requests exceeds the maximum capacity of the miss status holding register, no new translation requests are received.

The device, wherein the method further comprises:

The processor according to an embodiment of the present application includes any of the data access devices described above.

A computer system according to an embodiment of the application, comprising a processor according to any of the claims 15.

The storage medium according to an embodiment of the present application is used for storing a computer program for executing any one of the data operation methods described above.

These and other features of the disclosed system, method, and hardware device, as well as the methods of operation and functions of the related elements of structure, as well as the combination of parts and economies of manufacture, will become more apparent upon consideration of the following description and the appended claims with reference to the accompanying drawings, wherein like reference numerals designate corresponding parts in the figures. It is to be understood, however, that the drawings are designed solely for the purposes of illustration and description and are not intended as a definition of the limits of the application.

The application will now be described in more detail with reference to the drawings and specific examples, which are not intended to limit the application thereto.

Drawings

FIG. 1 is a schematic diagram of a data computing system according to an embodiment of the application.

FIG. 2 is a schematic diagram of a memory expansion unit according to an embodiment of the application.

Fig. 3 is a schematic structural diagram of a memory expansion card according to an embodiment of the application.

FIG. 4 is a schematic diagram illustrating a memory access processing module according to an embodiment of the application.

Detailed Description

The structural and operational principles of the present application are described in detail below with reference to the accompanying drawings:

the present application is presented to enable one of ordinary skill in the art to make and use the embodiments and is provided in the context of a particular application and its requirements. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the specification. Thus, the present specification is not limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein.

For a better understanding of the present application, some of the terms used herein are explained as follows.

Computer systems, which may represent desktop, server, general-purpose embedded systems, or other data processing systems.

Memory, which may represent a storage device in a computer system, may also be referred to as memory or main memory.

A physical address (Physical Address PA), which may also be referred to as a real address, is provided to the address bus to access the memory.

The virtual address (Virtual Address VA), which may also be referred to as a logical address, is an abstract address used by the program. The virtual address space may be larger than the physical address space, and the virtual address may be mapped to a corresponding physical address.

Page (page) refers to dividing a virtual address space into multiple parts, each part being a virtual page, and dividing a physical address space into multiple parts, each part being a physical page.

A root page table, stored in a memory, for reflecting a correspondence between a virtual page and a physical page, generally includes a plurality of entries, each of which includes a mapping relationship from the virtual page to the physical page and some attribute information, so as to be used for translating a virtual address in the virtual page into a physical address in the corresponding physical page.

A Translation Look-aside Buffer (TLB) is used to Buffer some entries in the root page table that may be frequently used, so as to facilitate fast call during address Translation, which may also be referred to as a fast table, and speed up the address Translation process.

The cache entry defines an entry stored in the translation lookaside cache TLB as a cache entry.

And (5) backfilling the table items: an entry retrieved from the root page table based on TLB mismatch information also needs to be rewritten into the translation look-aside cache TLB, and is therefore referred to as a backfill entry.

FIG. 1 is a schematic diagram of a computer system according to an embodiment of the application. As shown in fig. 1, computer system 100 includes one or more processors 101. The processor 101 may be, for example, a stand-alone processor, or may be a processing circuit in a system-on-chip (System of Chip SoC), including a microprocessor, a computing processing unit, a digital signal processing unit, a Complex Instruction Set Computing (CISC) microprocessor, a Reduced Instruction Set (RISC) microprocessor, a Graphics Processing Unit (GPU), a neural Network Processing Unit (NPU), a dedicated Data Processing Unit (DPU), or any other type of processor or processing circuit that may be implemented by an integrated circuit.

Specifically, as shown in fig. 1, in the computer system 100, the processor 101 may transmit signals, such as address, data, or control signals, with other devices via the bus 102. Bus 102 may be a processor bus such as a direct media interface (Direct Media Interface DMI) bus, a peripheral component interconnect standard (Peripheral Component Interconnect PCI) based bus, a memory bus, or other type of bus.

Other devices may be, for example, memory 103, display device 104, input device 105, network device 106, and data acquisition device 107.

Memory 103, which is the main memory of computer system 100, may be dynamic Random access memory (Dynamic Random Access Memory DRAM), static Random access memory (Static Random-Access Memory SRAM), or other types of memory.

The display device 104 is used to display an interface or information that the processor needs to display to the user.

An input device 105, such as a keyboard, mouse, touch panel, etc., is used to communicate user operations to the corresponding processor 101 to effect control of the computer system 100.

Network device 106 is used to enable computer system 100 to access a network, and may include wireless network devices and/or wired network devices.

The data acquisition device 107 is used to acquire information such as images, sounds, temperature, etc., and may be, for example, a microphone, a video camera, a temperature sensor, etc.

It should be noted that fig. 1 only illustrates one component architecture of the computer system 100, and the computer system 100 may further include other devices, which is not limited to the present application.

FIG. 2 is a schematic diagram of a processor according to an embodiment of the application. As shown in FIG. 2, processor 101 of the present application may include at least one core 1010, each core including a processing unit 1011, a first cache 1012, a second cache 1013, a memory management unit 1014, and a translation look aside buffer 1015. The third cache 1016 is shared among the cores 1010, the first cache 1012 is located near the processing unit 1011, and the second cache 1013 is interposed between the first cache 1012 and the third cache 1016. The present application is described with respect to a three-level cache only, although other cache structures are possible.

The processing unit 1011 issues a memory access instruction, and the memory management unit 1014 first queries the translation look-up buffer 1015 for a physical address corresponding to the virtual address according to the virtual address. If there is a cache entry in the translation look-aside buffer 1015 that matches the virtual address, then the virtual address is translated to a physical address based on the cache entry in the translation look-aside buffer 1015, referred to as a TLB hit (hit). If there is no cache entry in the translation look-aside buffer 1015 that matches the virtual address, then referred to as a TLB miss (miss), the memory management unit 1014 accesses the root page table in memory to obtain the corresponding entry. Thus, the translation lookaside buffer 1015 may reduce the number of memory accesses by the memory management unit 1014, saving address translation time.

It should be noted that fig. 2 only illustrates one architecture of the processor 101, and the processor 101 may further include registers for storing different types of data and/or instructions, for example, but not limited to, integer registers, floating point registers, status registers, instruction registers, pointer registers, and the like.

To better manage the address space exclusive to each process, the computer system may allocate separate virtual address spaces for some processes and provide virtual to physical address mappings to map or de-map the virtual address spaces to the physical address spaces.

Since the transfer of data in a computer system is typically done in units of pages, the computer system and/or operating system typically manages physical address space and virtual address space in units of pages, which may be larger than the physical address space, i.e., one virtual page in the virtual address space may be mapped to one physical page in the physical address space, or into an swap file, or may not have mapped content.

Based on the page management mechanism, the mapping relationship between each virtual page in the virtual address space and each physical page in the physical address space may be stored in a root page table of the memory. Root page tables typically include a number of page table entries (Entry), each for providing a mapping relationship between a virtual page and a corresponding physical page, so that virtual addresses in the virtual page that match the Entry can be translated into corresponding physical addresses in accordance with the page table Entry.

For a process, the virtual address range (which may be referred to as the page size of the virtual page) corresponding to each virtual page should be consistent with the page size of the corresponding physical page, such as, but not limited to, 4K, 64K, 2M, etc. It should be added that, for different processes, the page sizes of the corresponding virtual pages may be consistent or inconsistent; similarly, the page size of the corresponding physical page may or may not be consistent for different processes, with different embodiments having different choices.

FIG. 3 is a diagram illustrating a relationship between virtual addresses and page sizes according to an embodiment of the application. As shown in FIG. 3, virtual address VA comprises 48 bits, bit 0 through bit 47.

For a page table structure with a page size of 4K, the lowest 12 bits (i.e., bits 0 through 11) in the virtual address VA are the intra-page addresses, and the remaining 36 bits (i.e., bits 12 through 47) in the virtual address VA are the corresponding virtual page numbers. After receiving the translation request, the matching cache entry is looked up in the translation look-up cache according to the 36-bit virtual page number in virtual address VA.

For a page table structure with a page size of 64K, the lowest 16 bits (i.e., bits 0 through 15) in the virtual address VA are the intra-page addresses, and the remaining 32 bits (i.e., bits 16 through 47) in the virtual address VA are the corresponding virtual page numbers. After receiving the translation request, a lookup of the matching cache entry is performed according to the 32-bit virtual page number in virtual address VA.

For a page table structure with a page size of 2M, the lowest 21 bits (i.e., bits 0 through 20) in the virtual address VA are the intra-page addresses, and the remaining 27 bits (i.e., bits 21 through 47) in the virtual address VA are the corresponding virtual page numbers. After receiving the translation request, a lookup of the matching cache entry is performed according to the 27-bit virtual page number in virtual address VA.

The virtual address VA is divided in a similar manner for page table structures of other page sizes, or for other bits of virtual address, and will not be described in detail herein. The following describes the page table structure of two page sizes, 4K and 64K, but the present application is not limited thereto and can be applied to page table structures of other page sizes.

For computer systems supporting multiple page sizes (4K, 64K), when a TLB mismatch (miss) occurs in a translation request, a cache line is allocated and then a new page is requested by the memory management unit. There are two allocation modes:

first, cache lines are allocated at a minimum page size, e.g., 4K. Then, a lookup is performed in memory according to the 36-bit virtual page number in virtual address VA, so the returned page table may be either a page table with a page size of 4K or a page table with a page size of 64K. When the page table with the page size of 64K is returned, the virtual address VA in the 4K-64K interval is misjudged as miss, and the virtual address VA in the part of the interval needs to be translated again, so that the access delay is increased, and the flow pressure of the memory management unit is increased.

Second, cache lines are allocated according to a larger page size, e.g., 64K. Then, the page table returned may be either a page table with a page size of 64K or a page table with a page size of 4K, by searching in the memory according to the 32-bit virtual page number in the virtual address VA. When the page table with the page size of 4K is returned, the virtual address VA between 4K and 64K is misjudged to be missing, and a wrong translation result is returned.

In the conventional computer system, a page table having a page size of 64K is often used, and although the hit rate can be improved by using the second method, the first method is generally used for processing in order to avoid a translation error of the second method.

How to simultaneously utilize the advantages of the two modes, not only satisfies the performance improvement of the second mode, but also avoids the occurrence of translation errors.

FIG. 4 is a flow chart illustrating a method for accessing data according to an embodiment of the application. Referring to fig. 1 to 4, a data access method 200 of the present application includes:

step S210, searching the matched cache table item.

First, each translation request is preconfigured to have a page size of a first page size, for example 64K. The preset page size of the translation request may be marked by adding a marking bit to the translation request, or may be preset by other means, which is not limited to the present application.

Then, after receiving the translation request, the translation look-up buffer 1015 is searched for a cache entry matching the virtual address VA specified by the translation request according to the 32-bit virtual page number of the virtual address VA, and if so, the TLB hit is indicated, and the virtual address is translated into a corresponding physical address according to the matching cache entry in the translation look-up buffer. This is a common translation method and is not described in detail herein.

If there is no cache entry in the translation look-aside cache 1015 that matches the virtual address VA specified by the translation request, indicating a TLB miss, the following steps are performed.

In step S220, a cache line is allocated.

When no cache entry is found in the translation look-aside buffer 1015, or in other words, there is no cache entry in the translation look-aside buffer 1015 that matches the virtual address VA specified by the translation request, a TLB mismatch is indicated. At this time, it is necessary to find the backfill entry matching the virtual address VA in the memory, and at the same time, record TLB mismatch information to the miss status holding register (Miss Status Holding Regi ster MSHR), and the translation look-aside buffer 1015 allocates a buffer line to buffer the page table where the to-be-backfilled entry is located, where the page table where the to-be-backfilled entry is located is referred to as a return page table.

Each cache line corresponds to a unique cache line tag (tag) and a cache line index (index), and a valid bit and a size tag bit are set in the cache line tag. The cache line index is recorded in the miss status holding register and is provided with a repeat lookup flag bit and a cache line number.

The valid bit is used to indicate the state of the cache line, and when the valid bit is valid, for example, is "1", it indicates that the cache line can be used to cache the return page table where the table entry to be backfilled is located, and when the valid bit is invalid, for example, is "0", it indicates that the cache line is occupied and cannot be used to cache the return page table where the table entry to be backfilled is located. Of course, "1" may be used to indicate invalid and "0" may be used to indicate valid, and the present application is not limited thereto.

The size flag bit is used to indicate the page size that the cache line may cache back to the page table, and may be set to 1 bit, 2 bits, or multiple bits, depending on the number of different page table sizes used by the computer system 100, for each of a number of different page table sizes. Taking the example of the size flag bit being set to 1 bit, [0] is used to represent a page table size of 4K, and [1] is used to represent a page table size of 64K. The application is not limited thereto.

The repeat lookup flag bit is used to indicate whether a matching cache entry needs to be looked up again in the translation look-up buffer 1015. When the repeat lookup flag bit is configured to be valid, for example, 1, it indicates that a repeat lookup is required, and when the repeat lookup flag bit is configured to be invalid, for example, 0, it indicates that a repeat lookup is not required, and address translation can be performed directly. Of course, it is also possible to use "0" to indicate that it is valid, and "1" to indicate that it is invalid, and it is not necessary to search again, and the application is not limited thereto.

The cache line number is used to indicate a cache line location allocated for the translation request, and when a return page table where the table entry to be backfilled matching the virtual address VA is located is found in the memory, the return page table is cached into the translation look-aside buffer 1015 according to the cache line number.

S230, acquiring a return page table.

After the translation look-aside buffer 1015 allocates a buffer line, the size flag bit in the buffer line tag corresponding to the allocated buffer line is configured to correspond to the first page size, for example, the size flag bit is configured to [1], and at the same time, the memory management unit 1014 is requested to perform a page table lookup (Page Table Walk PTW), to search from the root page table of the memory, and to obtain a return page table where the entry to be backfilled matches the virtual address VA.

The return page table in which the entry to be backfilled is located has a second page size, which may be a page table with a page size of 64K or a page table with a page size of 4K.

When the second page size is the same as the first page size, for example, 64K, the return page table is backfilled into the cache line corresponding to the translation look-aside buffer 1015 according to the cache line number in the cache line index, and when the Miss State Holding Register (MSHR) releases the entry, the virtual address VA can be converted into the physical address PA according to the entry to be backfilled in the return page table, so as to perform normal address translation.

S240, searching the matched cache table entry again.

When the second page size is smaller than the first page size, for example, the second page size is 4K, the repeat-lookup flag bit is set to be valid, for example, "1". Meanwhile, the page size corresponding to the virtual address VA and the size flag bit in the cache line tag are set to correspond to the second page size, for example, 4K. And searching whether a matched cache table entry exists in the translation look-up buffer 1015 according to the 36-bit virtual page number of the virtual address VA.

When releasing an entry in the MSHR, if the repeat lookup flag bit is set to invalid, the virtual address is translated normally, returning the corresponding physical address. If the repeat lookup flag bit is set to be valid, the size flag bit in the cache line tag is configured to be [0], and the page table structure with the page size of 4K is used again to search whether a matched cache item exists in the translation look-up buffer again according to the 36-bit virtual page number of the virtual address VA.

To avoid deadlock from being formed by a re-lookup, when the number of translation requests exceeds the maximum capacity of the miss status holding register, then no new translation requests are received.

In the above description, only the page sizes are 4K and 64K, and for the case of other page sizes, as long as the returned page size (second page size) is smaller than the preset page table size (first page size), the matched cache entry needs to be searched again, and specific execution steps are not repeated here.

Each of the processes, methods, and algorithms described in the preceding paragraphs may be embodied in, and fully or partially automated by, code modules executed by one or more computer systems or computer processors (including computer hardware). These processes and algorithms may be implemented in part or in whole in application specific circuitry.

When the functions disclosed herein are implemented in the form of software functional units and sold or used as a stand-alone product, they may be stored in a non-volatile computer-readable storage medium executable by a processor. The specific technical solutions (in whole or in part) or aspects that facilitate the current technology disclosed herein may be embodied in the form of software products. The software product may be stored in a storage medium containing a plurality of instructions to cause a computing device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of a method of an embodiment of the application. The storage medium may include a flash drive, portable hard drive, ROM, RAM, magnetic disk, optical disk, another medium usable to store program code, or any combination thereof.

Particular embodiments further provide a system comprising a processor and a non-transitory computer readable storage medium storing instructions executable by the processor to cause the system to perform operations corresponding to steps in any of the methods of the embodiments described above. Particular embodiments also provide a non-transitory computer-readable storage medium configured with instructions executable by one or more processors to cause the one or more processors to perform operations corresponding to steps in any of the methods of the embodiments described above.

Embodiments disclosed herein may be implemented by a cloud platform, server, or group of servers (hereinafter collectively referred to as "service systems") that interact with clients. The client may be a terminal device, or a client registered by a user with the platform, wherein the terminal device may be a mobile terminal, a Personal Computer (PC), and any device that may be installed with a platform application.

The various features and processes described above may be used independently of each other or may be combined in various ways. All possible combinations and subcombinations are intended to fall within the scope of this disclosure. Furthermore, certain methods or process modules may be omitted in some embodiments. Nor is the method and process described herein limited to any particular order, as the blocks or states associated therewith may be performed in other suitable order. For example, the described blocks or states may be performed in a non-specifically disclosed order, or multiple blocks or states may be combined in one block or state. The example blocks or states may be performed sequentially, concurrently, or in other ways. Blocks or states may be added to or removed from the disclosed exemplary embodiments. The configuration of the exemplary systems and components described herein may vary from that described. For example, elements may be added to, removed from, or rearranged as compared to the disclosed exemplary embodiments.

Various operations of the example methods described herein may be performed, at least in part, by algorithms. The algorithm may include program code or instructions stored in a memory (e.g., the non-transitory computer-readable storage medium described above). Such algorithms may include machine learning algorithms. In some embodiments, the machine learning algorithm may not explicitly program the computer to perform the function, but may learn from the training data to generate a predictive model to perform the function.

Various operations of the example methods described herein may be performed, at least in part, by one or more processors that are temporarily configured (e.g., via software) or permanently configured to perform the relevant operations. Whether temporarily configured or permanently configured, these processors may constitute a processor-implemented engine that operates to perform one or more of the operations or functions described herein.

The methods described herein may be implemented at least in part by a processor, with one or more specific processors being examples of hardware. For example, at least some operations of one method may be performed by one or more processors or processor-implemented engines. In addition, one or more processors may also run in a "cloud computing" environment or as "software as a service" (SaaS) to support performance of related operations. For example, at least some of the operations may be performed by a set of computers (e.g., a machine comprising a processor), which may be accessed via a network (e.g., intelt) and one or more suitable interfaces (e.g., application Program Interfaces (APIs)).

The performance of certain operations may be distributed among processors, residing not only within one machine, but also across several machines. In some example embodiments, the processor or processor-implemented engine may be located in a single geographic location (e.g., within a home environment, an office environment, or a server farm). In other exemplary embodiments, the processor or processor-implemented engine may be distributed over several geographic locations.

In this specification, multiple instances may implement a component, operation, or structure described as a single instance. Although individual operations of one or more methods are illustrated and described as separate operations, one or more of the individual operations may be performed concurrently and nothing requires that the operations be performed in the order illustrated. Structures and functions presented as separate components in the example configuration may be implemented as a combined structure or component. Likewise, structures and functions presented as a single component may be implemented as separate components. These and other variations, modifications, additions, and improvements may fall within the scope of the subject matter herein.

While an overview of the subject matter has been described with reference to specific example embodiments, various modifications and changes may be made to these embodiments without departing from the broader scope of the disclosed embodiments. The term "application" may be used herein, alone or collectively, to refer to these embodiments of the subject matter for convenience only and is not intended to voluntarily limit the scope of this application to any single disclosure or concept if more than one is in fact disclosed.

The embodiments illustrated herein are described in sufficient detail to enable those skilled in the art to practice the disclosed teachings. Other embodiments may be utilized and derived therefrom, such that structural and logical substitutions and changes may be made without departing from the scope of this disclosure. The detailed description is, therefore, not to be taken in a limiting sense, and the scope of various embodiments is defined only by the appended claims, along with the full range of equivalents to which such claims are entitled.

Any process descriptions, elements, or blocks in flow charts described herein and/or depicted in the drawings should be understood as possibly representing modules, segments, or code segments including one or more executable instructions for implementing specific logical functions or steps in the process. Alternate implementations are included within the scope of embodiments described herein in which elements or functions may be deleted from what is shown or discussed, out of order, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those skilled in the art.

Herein, "or" is inclusive, rather than exclusive, unless explicitly indicated otherwise or indicated by context. Thus, herein, "a, B, or C" means "a, B, a and C, B and C, or a, B and C", unless explicitly indicated otherwise by the context. Furthermore, "and" are both common and individual unless explicitly indicated otherwise or by context. Thus, herein, "a and B" means "a and B, collectively or individually," unless explicitly indicated otherwise by the context. Furthermore, multiple instances may be provided for a resource, operation, or structure described herein as a single instance. Furthermore, the boundaries between various resources, operations, engines, and data stores are somewhat arbitrary, and particular operations are illustrated in the context of particular illustrative configurations. Other allocations of functionality are contemplated and may fall within the scope of various embodiments of the present disclosure. In general, structures and functions presented as separate resources in the example configuration may be implemented as a combined structure or resource. Likewise, the structure and functionality presented as a single resource may be implemented as separate resources. These and other variations, modifications, additions, and improvements fall within the scope of the embodiments of the present disclosure as represented by the appended claims. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.

The term "comprising" or "comprises" is used to indicate the presence of a subsequently stated feature, but it does not exclude the addition of other features. Conditional language, for example, "may," "may," unless specifically stated otherwise, or otherwise understood in the context of use, is generally intended to express that certain embodiments include, but not include, certain features, elements, and/or steps. Thus, such conditional language does not generally imply that features, elements and/or steps are in any way required by one or more embodiments or that one or more embodiments must include logic for deciding, with or without user input or prompting, whether these features, elements and/or steps are included or are to be performed in any particular embodiment.

Of course, the present application is capable of other various embodiments and its several details are capable of modification and variation in light of the present application, as will be apparent to those skilled in the art, without departing from the spirit and scope of the application as defined in the appended claims.

Claims

1. A method of data access, comprising:

searching whether a matched cache table entry exists in a translation look-up cache according to a virtual address specified by a translation request, wherein the virtual address corresponds to a first page size;

when the matched cache table entry does not exist in the translation backup cache, a cache line is allocated, and the cache line corresponds to the first page size;

acquiring an entry to be backfilled matched with the virtual address from a root page table, wherein the entry to be backfilled maps a second page size;

and when the second page size is smaller than the first page size, setting the virtual address to correspond to the second page size, and searching the matched cache table entry in the translation look-up cache again.

2. The method of claim 1, wherein each of the cache lines corresponds to a cache line tag having a size flag bit therein, the size flag bit being used to indicate a page size to which the cache line corresponds.

3. The method according to claim 2, wherein the method further comprises:

4. A method as claimed in claim 3, wherein each said cache line corresponds to a cache line index having a repeat lookup flag bit indicating whether a matching cache entry needs to be looked up again in said translation look-up buffer.

5. The method according to claim 4, wherein the method further comprises:

6. The method of claim 5, wherein when the number of translation requests exceeds a maximum capacity of a miss status holding register, no new translation requests are received.

7. The method according to claim 6, wherein the method further comprises:

8. A data access device comprising a memory management unit and a translation look-aside buffer, the data access device configured to:

searching whether a matched cache table entry exists in a translation look-up cache according to a virtual address specified by a translation request received by the memory management unit, wherein the virtual address corresponds to a first page size;

9. The apparatus of claim 8, wherein each of the cache lines corresponds to a cache line tag having a size flag bit therein, the size flag bit indicating a page size to which the cache line corresponds.

10. The apparatus of claim 9, wherein the size flag bit is modified to correspond to the second page size when the second page size is smaller than the first page size.

11. The apparatus of claim 10, wherein each of the cache lines corresponds to a cache line index having a repeat lookup flag bit indicating whether a matching cache entry needs to be looked up again in the translation look-up cache.

12. The apparatus of claim 11, wherein the repeat lookup flag bit is set to active when the second page size is smaller than the first page size.

13. The apparatus of claim 12, wherein when the number of translation requests exceeds a maximum capacity of a miss status holding register, no new translation requests are received.

14. The apparatus of claim 13, wherein the method further comprises:

15. A processor comprising a data access device as claimed in any one of claims 8 to 14.

16. A computer system comprising any one of the processors recited in claim 15.

17. A storage medium storing a computer program for executing the data operation method according to any one of claims 1 to 7.