CN117348934A - Data caching method, data caching device and processor - Google Patents
Data caching method, data caching device and processor Download PDFInfo
- Publication number
- CN117348934A CN117348934A CN202311312225.6A CN202311312225A CN117348934A CN 117348934 A CN117348934 A CN 117348934A CN 202311312225 A CN202311312225 A CN 202311312225A CN 117348934 A CN117348934 A CN 117348934A
- Authority
- CN
- China
- Prior art keywords
- queue
- data
- instruction
- target
- item
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 68
- 238000012545 processing Methods 0.000 claims description 17
- 238000013500 data storage Methods 0.000 description 32
- 239000000872 buffer Substances 0.000 description 11
- 238000004891 communication Methods 0.000 description 9
- 230000001419 dependent effect Effects 0.000 description 9
- 238000010586 diagram Methods 0.000 description 7
- 230000008569 process Effects 0.000 description 7
- 238000004422 calculation algorithm Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 230000003139 buffering effect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000007667 floating Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 239000002699 waste material Substances 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000012536 storage buffer Substances 0.000 description 1
- 239000000725 suspension Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3802—Instruction prefetching
- G06F9/3814—Implementation provisions of instruction buffers, e.g. prefetch buffer; banks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3802—Instruction prefetching
- G06F9/3808—Instruction prefetching for instruction reuse, e.g. trace cache, branch target cache
- G06F9/381—Loop buffering
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Advance Control (AREA)
Abstract
The disclosure provides a data caching method, a device and a processor, wherein the data caching method comprises the following steps: receiving target data obtained by executing a target instruction and a target address corresponding to the target data; writing the target address into a first entry in a first queue; the target data is written into a second item group in a second queue independent of the first queue, wherein the second item group comprises at least one continuous second item. The data caching method improves the utilization rate of hardware, and further improves the performance of the system.
Description
Technical Field
The embodiment of the disclosure relates to a data caching method, a data caching device and a processor.
Background
There are a variety of instruction sets supported by computer processors, with a conventional single instruction supporting a maximum of 64 bits (bits) of data processing. In order to improve the performance of application scenes such as artificial intelligence, audio and video processing, cryptography and the like, the conventional high-performance processor supports a conventional instruction set and also supports a vector instruction set at the same time: such as a SIMD instruction set, which supports at most a single instruction to process 128bits of data; the AVX512 instruction set can support a single instruction to process 512bits of data at maximum; the SVE instruction set can theoretically support that a single instruction processes 2048bits of data at maximum. These new vector instruction sets bring high performance computations while also putting stress on the data storage unit.
Disclosure of Invention
At least one embodiment of the present disclosure provides a data caching method, including: receiving target data obtained by executing a target instruction and a target address corresponding to the target data; writing the target address into a first entry in a first queue; the target data is written into a second item group in a second queue independent from the first queue, wherein the second item group comprises at least one continuous second item.
For example, the data caching method provided in at least one embodiment of the present disclosure further includes: based on the width of the target data and the width of the individual items in the second queue, a first number of second items included in a second item group writing the target data into the second queue is determined, and the first number is written into the first item.
For example, the data caching method provided in at least one embodiment of the present disclosure further includes: and writing a starting position corresponding to the second item group in the second queue into the first item.
For example, in the data caching method provided in at least one embodiment of the present disclosure, the first queue and the second queue are both ring queues, a start pointer of the first queue and a start pointer of the second queue correspond to each other, an end pointer of the first queue and an end pointer of the second queue correspond to each other, and a first item in the first queue and a second item group in the second queue correspond to each other.
For example, the data caching method provided in at least one embodiment of the present disclosure, before receiving the target data obtained by executing the target instruction and the target address corresponding to the target data, further includes: decomposing the target instruction to obtain a first sub-instruction and a second sub-instruction, wherein the target address can be obtained after the first sub-instruction is executed, and the target data can be obtained after the second sub-instruction is executed; and respectively judging whether the number of the items in the idle state of the first queue and the number of the items in the idle state of the second queue are sufficient to simultaneously write the target address and the target data based on the first number.
For example, the data caching method provided in at least one embodiment of the present disclosure, before receiving the target data obtained by executing the target instruction and the target address corresponding to the target data, further includes: writing the first sub-instruction and the second sub-instruction into a third queue for executing the first sub-instruction and a fourth queue for executing the second sub-instruction, respectively, to wait for subsequent execution, if the number of items in the first queue in the idle state and the number of items in the second queue in the idle state are sufficient to be able to write the target address and the target data simultaneously; or,
In the event that at least one of the number of items in the first queue that are in the idle state and the number of items in the second queue that are in the idle state is insufficient to enable simultaneous writing of the target address and the target data, processing of the first sub-instruction and the second sub-instruction is aborted.
For example, the data caching method provided in at least one embodiment of the present disclosure, before receiving the target data obtained by executing the target instruction and the target address corresponding to the target data, further includes: after the first sub-instruction is written into the third queue, the mark information of the item corresponding to the ending pointer of the first sub-instruction in the first queue is stored in the third queue; after the second sub-instruction is written into the fourth queue, the mark information of the item corresponding to the end pointer of the second sub-instruction in the second queue is stored in the fourth queue.
For example, the data caching method provided in at least one embodiment of the present disclosure further includes, after storing flag information of an item corresponding to an end pointer of the first sub-instruction in the first queue in a third queue: the position of the first item is reserved for writing the target address in the first queue through an end pointer of the first queue.
For example, the data caching method provided in at least one embodiment of the present disclosure further includes, after storing flag information of an item corresponding to an end pointer of the second sub-instruction in the second queue in the fourth queue: and reserving a second item group comprising the first number of second items in the second queue through an end pointer of the second queue for writing target data corresponding to the target address.
For example, the data caching method provided in at least one embodiment of the present disclosure further includes, after writing the first sub-instruction and the second sub-instruction into a third queue and a fourth queue, respectively: and executing the first sub-instruction to obtain the target data, and executing the second sub-instruction to obtain the target address.
For example, in a data caching method provided in at least one embodiment of the present disclosure, the writing the target address into the first entry of the first queue includes: and writing the target address into the first item reserved in the first queue based on the mark information corresponding to the first queue.
For example, in a data caching method provided in at least one embodiment of the present disclosure, the writing the target data into a second item group in a second queue independent of the first queue includes: and writing the target data into the second item group reserved in the second queue based on the mark information corresponding to the second queue.
For example, the data caching method provided in at least one embodiment of the present disclosure further includes: after the target address is written in the first item and the target data is written in the second item group, the target address and the target data are externally provided and a commit bit corresponding to the first item is set to a valid state.
For example, the data caching method provided in at least one embodiment of the present disclosure further includes, after the target address and the target data are provided outside of the pair: and respectively releasing the first item in the first queue and the second item group in the second queue, and setting the commit bit corresponding to the first item to be in an invalid state.
For example, the data caching method provided in at least one embodiment of the present disclosure further includes, after setting the commit bit corresponding to the first item to an invalid state: reclaiming the first item in the first queue by a start finger of the first queue; reclamation in the second queue by the start of the second queue refers to reclamation for a second item included in the second item group.
At least one embodiment of the present disclosure provides a data caching apparatus, including: the receiving module is configured to receive target data obtained by executing a target instruction and a target address corresponding to the target data; a first write module configured to write the target address into a first entry in a first queue; a second writing module configured to write the target data into a second item group in a second queue independent of the first queue, wherein the second item group includes at least one consecutive second item.
At least one embodiment of the present disclosure provides a data caching apparatus, including: a processing unit; a memory having executable instructions stored thereon; wherein the executable instructions, when executed by the processing unit, implement the data caching method as described above.
At least one embodiment of the present disclosure provides a processor including the data caching apparatus described in the foregoing.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings of the embodiments will be briefly described below, and it is apparent that the drawings in the following description relate only to some embodiments of the present disclosure, not to limit the present disclosure.
FIG. 1 is a flow chart of a store instruction write queue;
FIG. 2 is a flow chart of store instruction execution;
FIG. 3A is a flow chart of a method for buffering data according to at least one embodiment of the present disclosure;
FIG. 3B is a schematic diagram of a data buffering device according to at least one embodiment of the present disclosure;
FIG. 4 is a schematic diagram of an address storage area and a data storage area of a decoupled store queue according to at least one embodiment of the present disclosure;
FIG. 5 is a flow chart of store instruction execution based on a decoupled store queue provided by at least one embodiment of the present disclosure;
Fig. 6 is a schematic block diagram of an electronic device provided in at least one embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present disclosure. It will be apparent that the described embodiments are some, but not all, of the embodiments of the present disclosure. All other embodiments, which can be made by one of ordinary skill in the art without the need for inventive faculty, are within the scope of the present disclosure, based on the described embodiments of the present disclosure.
Unless defined otherwise, technical or scientific terms used in this disclosure should be given the ordinary meaning as understood by one of ordinary skill in the art to which this disclosure belongs. The terms "first," "second," and the like, as used in this disclosure, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", etc. are used merely to indicate relative positional relationships, which may also be changed when the absolute position of the object to be described is changed.
The present disclosure is illustrated by the following several specific examples. Detailed descriptions of known functions and known parts (elements) may be omitted for the sake of clarity and conciseness in the following description of the embodiments of the present disclosure. When any part (element) of an embodiment of the present disclosure appears in more than one drawing, the part (element) is denoted by the same or similar reference numeral in each drawing.
Some of the english abbreviations and terms that are used in the present disclosure are explained and illustrated herein.
UopQ: a micro instruction Queue (Uop Queue), in which machine instructions (often referred to as "instructions") are decoded into one or more micro instructions (uops), then the decoded uops are stored in the UopQ, e.g., the Queue is of the fifo type, and if the first-in Uop is not fetched, then the later-in Uop waits.
STD: the Data (Store Data) is stored, the Data part for the Store instruction is Uop, and after the Uop execution is completed, the Data information (Data) of the Store instruction is obtained.
DataSize: the data width, i.e., the number of bits occupied by the STD.
And (3) STA: a Store Address (Store Address), a Store Address of data for which a Store instruction is directed, the portion including both a data width DataSize and other necessary information, being Uop, and after execution of the Uop is completed, obtaining Address information (Address) of the data of the Store instruction.
STQ: a Store Queue (Store Queue), which is a type of instruction executable by a processor, is used to write Data in a CPU into a cache or other designated hardware location, and is used to temporarily Store Data (Data) to be written by the instruction and a Data Address (Address) and other information (DataSize, committed, etc.) corresponding to the Data.
dataGranual: data granularity, the maximum data width (DataSize) that a single item of STD used to represent the STQ can hold.
Dispatch: instruction dispatch, i.e., the process of fetching uops from UopQ and dispatching them to the next level instruction dispatch Queue according to instruction type or other rules, such as AGSQ (Agen Sequence Queue, address generation micro instruction Queue for caching uops associated with calculated addresses), ALSQ (Algorithm Sequence Queue, arithmetic operation micro instruction Queue for caching uops associated with calculated instructions), FSQ (Float point Sequence Queue, floating point operation micro instruction Queue for caching uops associated with floating point operation type instructions), LDQ (Load Sequence Queue, read micro instruction Queue for caching uops associated with data read type instructions), SSQ (Store Sequence Queue, store micro instruction Queue for caching uops associated with data write type instructions), which may be collectively referred to as Issue Queue (Issue, IQ).
Dispatch Start: the suspension of the dispatch, that is, when the lower queue is full, causes the phenomenon that the Uop cannot continue to dispatch.
Entry: an entry in a Queue (Queue), also known as an "entry".
Token: the token, the number of entries in the Queue (Queue) that can be used in the idle state (Entry).
During the operation of a processor, there are a number of Store-type instructions (also referred to herein simply as Store instructions) for writing computing data to a cache or other hardware that, after obtaining the final address and data, store it in a Store Queue (STQ).
FIG. 1 shows a schematic diagram of an exemplary flow of processing a store instruction in a processor. The correspondence between the Data portion (Store Data, STD) obtained according to the Store instruction and the Address portion (Store Address, STA) obtained according to the Store instruction is shown in fig. 1.
After instruction fetch and Decode (Decode), store (store) instructions are typically broken down into 2 microinstructions (uops) in the instruction Dispatch stage: an address-dependent instruction portion (STA) of the store instruction and a data-dependent instruction portion (STD) of the store instruction (where the address-dependent instruction portion of the store instruction is typically 40 to 48bits, and the data-dependent portion of the store instruction is typically 64 to 256 bits). As described below, the Address-dependent instruction portion of the store instruction can acquire Address information (Address) of target Data after being executed, and the Data-dependent instruction portion of the store instruction can acquire Data information (Data) of target Data after being executed.
The two instruction portions (i.e., STA and STD microinstructions) are then dispatched to corresponding execution units for execution during the instruction Dispatch (Dispatch) stage, respectively. The following describes the instruction distribution flow in detail:
first, it is checked whether there are at least 1 empty entries (i.e., whether there is a token) available for each of AGSQ (Agen Sequence Queue, compute address Queue), ALSQ (Algorithm Sequence Queue, compute instruction related Queue) and STQ (Store Queue), for receiving an address related instruction portion of a Store instruction and a data related instruction portion of a Store instruction, respectively; the STQ is used for reserving operation at the moment, and the address information of corresponding target data and target data can be written after the instruction part related to the address of the instruction to be stored and the instruction part related to the data of the stored instruction are respectively executed.
Secondly, after the number of available empty entries in the AGSQ, ALSQ meets the requirement, writing the address-related instruction part of the store instruction into the AGSQ, writing the data-related instruction part of the store instruction into the ALSQ, they wait for the instruction schedule to enter the respective execution units, and saving in a Queue (IQ) the ID information of the Entry (Entry) of the STQ that is occupied in advance by the address-related instruction part of the store instruction and the data-related instruction part of the store instruction.
Finally, after the execution of the execution unit is completed, address information (Address) of the target Data and Data information (Data) of the target Data are obtained, and resources occupied by the ALSQ and AGSQ in the Address-related instruction portion of the store instruction and the Data-related instruction portion of the store instruction are released respectively, and Address information and Data information of the target Data are written into an Address information portion (infocegin) and a Data information portion (DataRegion) in the same entry of the STQ respectively according to the ID (identification information) of the entry of the STQ reserved before, so as to complete writing and storing of the target Data.
For example, the STQ is in a Circular Buffer (Circular Buffer) structure, and the Entry (Entry) of each STQ may be designed to store all information of one stored instruction, that is, the sum of the number of bits occupied by STA and STD, typically not more than 48bits (bits) in the portion corresponding to STA, and the maximum width of the data in the portion corresponding to STD is the same as that of the instruction, but for many CPUs supporting normal instructions (64 bits) and SIMD instructions (Single Instruction Multiple Data, that is, single instruction stream multiple data streams, for example, 128 bits), the current STQ may store up to 128bits of data for the portion corresponding to STD of each Entry, so as to ensure that each instruction information may be stored in the entries of 1 STQ. For larger widths, such as 512bits to 2048bits, the portion of each entry of the STQ corresponding to the STD is also designed to be the corresponding width, so that each entry of the STQ is realized to be the highest data width (such as 512 bits) that can be supported, but the data width is too large, which necessarily causes a rapid increase in the area occupied by the STQ in the chip.
Meanwhile, when the CPU executes a storage instruction with a conventional data bit width (e.g., 64 bits), only 64bits of the data information portion (DataRegion) capable of supporting 512bits can be utilized, resulting in low STQ resource utilization, resulting in a waste of computation power, and high-bit-width hardware also causes difficulty in timing design, resulting in an increase in cost.
For entries in which both address information and data information have been filled, the STQ will write data to the lower level cache or other hardware in order for these entries, then set the state of the "execute bit" (Committed bit) of the corresponding entry to 1, and release the entry of the STQ from the Start Pointer (Start Pointer) of the ring buffer; if an outstanding entry is encountered, the write operation and the release of the entry is stopped.
In addition, a Circular Buffer (Circular Buffer) is a fixed-size storage Buffer defined as a ring, and when the volume of data written into the Buffer reaches an upper limit, newly written data will overwrite previously written data.
The STQ is adapted to store address information and data information of an upper bit width store instruction, and the data information part (DataRegion) of each entry of the STQ is designed to be the upper bit width of the supported instruction, but when a store instruction smaller than the bit width is executed, a waste of hardware resources of the data information part (DataRegion) is caused.
The inventors of the present disclosure have noted the above-described problem, and thus split the above-described STQ in at least one embodiment of the present disclosure, by at least partially decoupling the address information portion (InfoRegion) and the data information portion (DataRegion) into 2 separate ring buffer structures, e.g., the structure of the address information portion (InfoRegion) coincides with that before decoupling, each entry of the data information portion (DataRegion) is designed to be suitable for a data bit width (data granularity, datagrant) of a normal instruction. For example, a 1-entry address occupies 1 entry of the address information portion, and 1-entry data occupies "DataSize (i.e., width of the data)/datagranularity". The embodiment of the disclosure improves the storage efficiency of the data information part, reduces the hardware area, saves the hardware design cost and reduces the system power consumption; or under the condition of the same area, the number of the entries in the address information part can be further increased, so that the total number of relevant storage instructions which can be used for caching by the STQ is increased, and the effect of unchanged area and increasing the number of the STQ entries is achieved.
At least one embodiment of the present disclosure provides a data caching method, and fig. 3A shows a flowchart of the data caching method; referring to fig. 3A, the data caching method includes steps 100 to 300 as follows:
Step 100, receiving target data obtained by executing a target instruction and a target address corresponding to the target data;
step 200, writing the target address into a first item in a first queue;
step 300, writing target data into a second item group in a second queue independent of the first queue, wherein the second item group includes at least one consecutive second item.
Here, the "target instruction" is an instruction that is a current description object, for example, a store instruction, and accordingly, execution of the instruction results in target data and a target address corresponding to the target data. The first queue and the second queue are independent of each other, which means that the first queue and the second queue can be operated, controlled independently of each other, but can be physically located together, e.g. belonging to the same memory unit, e.g. comprising different parts of the same memory unit, respectively. The first queue includes a plurality of first entries; the second queue comprises a plurality of second items, one or more of which during operation (e.g. a plurality of second items adjacent to each other) logically form a second set of items, each second set of items storing one item of data as a whole, e.g. in case a plurality of second items adjacent to each other may form a second set of items, each second item of the second set storing a part of the data. In operation, a first item in the first queue corresponds to a second item group in the second queue, and the number of second items included in each second item group corresponding to the first item is related to the first item itself rather than is fixed.
The receiving module 101, the first writing module 102, the second writing module 103, and the like may be implemented by software, hardware, firmware, or any combination thereof.
For example, the data caching method provided in at least one embodiment of the present disclosure may further include: based on the width of the target data and the width of the individual items in the second queue, a first number of second items included in a second item group writing the target data into the second queue is determined, and the first number is written into the first items.
In the above embodiment, based on the width of the target data and the width of the single item in the second queue (i.e., the maximum width of the single item), determining how many second items in the second queue include the first number of the second items, for example, an algorithm may be employed that includes: dividing the width of the target data by the maximum width of the individual entries in the second queue yields a first number.
For example, the data caching method provided in at least one embodiment of the present disclosure may further include: and writing a starting position corresponding to the second item group in the second queue into the first item. Thus, by reading the first item, the second item group can be located/addressed in the second queue accordingly, whereby the target data corresponding to the target address stored in the first item can be obtained.
In the above embodiment, the corresponding manner of the first queue and the second queue includes, but is not limited to, a linear queue, and after acquiring the start position of data writing and the data width/number, offset estimation can be performed based on the start position to locate all data portions.
For example, in the data caching method provided in at least one embodiment of the present disclosure, the first queue and the second queue are both ring queues, the start pointer of the first queue and the start pointer of the second queue correspond to each other, the end pointer of the first queue and the end pointer of the second queue correspond to each other, and the first item in the first queue and the second item group in the second queue correspond to each other. Accordingly, the first queue includes a plurality of first items and a plurality of second item groups corresponding to the second queue, so that the plurality of first items in the first queue and the plurality of second item groups in the second queue are in one-to-one correspondence, for example, the second item groups corresponding to the first items can be located in the second queue through information recorded by each first item in the first queue.
In the above embodiment, the initial positions of the start pointer of the first queue and the start pointer of the second queue and the movements after the subsequent operations are in one-to-one correspondence, and the initial positions of the end pointer of the first queue and the end pointer of the second queue and the movements after the subsequent operations are also in one-to-one correspondence, so as to reserve or release the entries.
For example, the data caching method provided in at least one embodiment of the present disclosure may further include, before receiving target data obtained by executing the target instruction and a target address corresponding to the target data: decomposing the target instruction to obtain a first sub-instruction and a second sub-instruction, wherein the first sub-instruction can acquire a target address after being executed, and the second sub-instruction can acquire target data after being executed; a determination is made as to whether the number of items in the first queue that are in the idle state and the number of items in the second queue that are in the idle state are sufficient to write the target address and the target data, respectively, based on the first number. For example, the first sub-instruction and the second sub-instruction are both micro-instructions (uops).
For example, the data caching method provided in at least one embodiment of the present disclosure may further include, before receiving target data obtained by executing the target instruction and a target address corresponding to the target data: writing the first sub-instruction and the second sub-instruction into a third queue for executing the first sub-instruction and a fourth queue for executing the second sub-instruction, respectively, to wait for subsequent execution, in the case that the number of items in the first queue in the idle state and the number of items in the second queue in the idle state are sufficient to be able to write the target address and the target data simultaneously; alternatively, in the case where at least one of the number of items in the first queue in the idle state and the number of items in the second queue in the idle state is insufficient to enable simultaneous writing of the target address and the target data, the processing of the first sub-instruction and the second sub-instruction is suspended.
For example, the data caching method provided in at least one embodiment of the present disclosure may further include, before receiving target data obtained by executing the target instruction and a target address corresponding to the target data: after the first sub-instruction is written into the third queue, the mark information of the item corresponding to the end pointer of the first sub-instruction in the first queue is stored in the third queue; after the second sub-instruction is written into the fourth queue, the mark information of the item corresponding to the end pointer of the second sub-instruction in the second queue is stored in the fourth queue.
For example, the data caching method provided in at least one embodiment of the present disclosure may further include, after storing flag information of an item corresponding to an end pointer of a first sub-instruction in a first queue in a third queue: the position of the first item is reserved in the first queue for writing the target address by the end pointer of the first queue.
For example, the data caching method provided in at least one embodiment of the present disclosure may further include, after storing flag information of an item corresponding to an end pointer of a second sub-instruction in a second queue in a fourth queue: and reserving a second item group comprising a first number of second items in the second queue through an end pointer of the second queue for writing target data corresponding to the target address.
For example, the data caching method provided in at least one embodiment of the present disclosure may further include, after writing the first sub-instruction and the second sub-instruction into the third queue and the fourth queue, respectively: and executing the first sub-instruction to obtain target data, and executing the second sub-instruction to obtain a target address.
For example, in the data caching method provided in at least one embodiment of the present disclosure, writing the target address into the first entry of the first queue may include: the target address is written into a first entry reserved in the first queue based on flag information corresponding to the first queue.
For example, in a data caching method provided in at least one embodiment of the present disclosure, writing target data into a second item group in a second queue independent of the first queue may include: and writing the target data into a second item group reserved in the second queue based on the mark information corresponding to the second queue.
For example, the data caching method provided in at least one embodiment of the present disclosure may further include: after the target address is written in the first item and the target data is written in the second item group, the commit bit, which is provided out of the pair and corresponds to the first item, is set to a valid state.
For example, the data caching method provided in at least one embodiment of the present disclosure may further include, after the target address and the target data are provided outside the pair: and releasing the first item in the first queue and the second item group in the second queue respectively, and setting the commit bit corresponding to the first item to be in an invalid state.
For example, the data caching method provided in at least one embodiment of the present disclosure may further include, after setting the commit bit corresponding to the first item to an invalid state: reclaiming the first item in the first queue by a start finger of the first queue; the reclamation in the second queue by the start of the second queue refers to reclaiming for the second item included in the second item group.
Fig. 4 is a schematic diagram of two decoupled queues for an address storage area and a data storage area, respectively, provided in at least one embodiment of the present disclosure.
As shown in fig. 4, in the embodiment of the present disclosure, the store queues are at least partially decoupled in correspondence relationship, so as to obtain two independent ring buffer structures, where the ring buffer structures correspond to each other in data: an address store queue (an example of a first queue in this disclosure) and a data store queue (an example of a second queue in this disclosure).
The Address storage queue (first queue) comprises a plurality of items (first items) which are sequentially arranged, wherein each item has the same size and comprises an Address field, a data size field, a commit field and other fields according to requirements; the data storage queue includes a plurality of sequentially arranged items (second items), each of the same size and including data fields, and a single second item or a plurality of second items adjacent to each other may constitute a second item group. In this embodiment, the Address information (Address) of one item of Data is decoupled from the corresponding Data information (Data) in the storage device, and it is no longer necessary to store the Address information and the Data information in the same entry of the same queue as shown in fig. 1, whereas in the embodiment of the present disclosure the Address information is stored in one queue and the Data information is stored in the other queue, but the two are still corresponding to each other, whereby they can be operated synchronously.
The data store queue and the address store queue in fig. 4 have respective start pointers (StartPointer) and end pointers (EndPointer), respectively. The start pointer represents the start position of all valid entries and is simultaneously the release start position of the queue; the end pointer represents the end position of all valid entries and is the position of the next writable entry at the same time. For a ring buffer queue (or ring buffer), in an initialization state, a start pointer and an end pointer point to the same position, and when an item pointed by the end pointer is written with data, the end pointer moves at least one item position backwards in the queue to reserve a data writing position; when there is data released from the entry pointed to by the start pointer of the queue, the start pointer is moved backward in the queue by the position of at least one entry.
For example, in at least one example, a start pointer in the data store queue and a start pointer in the address store queue correspond to each other, both moving in synchronization; similarly, the end pointer in the data storage queue and the end pointer in the address storage queue correspond to each other, and move in synchronization with each other. That is, when the start pointer in the data storage queue moves in position due to data change, the start pointer in the address storage queue also moves in position, when the end pointer in the data storage queue moves in position due to data change, the end pointer in the address storage queue also moves in position, so that both point to the data information and the address information of the same data, but as described above one first item corresponds to one second item group, and the second item group includes one or more second items, so that during dynamic operation, the number of items moved by both the start pointer in the data storage queue and the start pointer in the address storage queue is not necessarily the same, and the number of items moved by both the end pointer in the data storage queue and the end pointer in the address storage queue is not necessarily the same, and the specific number needs to be determined based on the width of the target data and the maximum width of a single item in the data storage queue.
The example shown in fig. 4 exemplifies a maximum data width that each entry in the data store queue can hold of 64 bits. For example, if the width of the target Data whose address information is 0x400 in the address storage queue is 16 bits, the number of bits (bits) used for the Data information Data0 corresponding to the target Data in the Data storage queue is 16, and the target Data only needs to use one item in the Data storage queue (and the second item has a free bit after being used), so that the corresponding second item group only includes one second item; and therefore, when inserting or deleting the target data, the pointer (start pointer or end pointer) of the address storage queue needs to be moved by only one item, and the pointer (start pointer or end pointer) of the data storage queue also needs to be moved by only one item.
For another example, the width of the target Data whose address information is 0x320 in the address storage queue is 88 bits, and since the maximum Data width of a single entry (i.e., item) is 64 bits, the target Data uses two entries (i.e., entries in which Data5 and Data6 are located) in the Data storage queue, wherein Data5 stores 64 bits of Data in the entry and Data6 stores the remaining 24 bits of Data (and thus the second item has a free bit after being used), thereby storing 88 bits of Data of the target Data as a whole, and thus the corresponding second item group includes 2 second items; and therefore, when inserting or deleting the target data, the pointer (start pointer or end pointer) of the address storage queue needs to be moved only by one entry, and the pointer (start pointer or end pointer) of the data storage queue needs to be moved by 2 entries.
For another example, the width of target Data whose address information is 0x308 in the address storage queue is 256 bits, and thus the target Data uses four entries (i.e., entries where Data13, data14, data15, and Data16 are located) in the Data storage queue since the maximum Data width of a single entry (i.e., item) is 64bits, 64bits of Data are stored in each entry (i.e., the second item has no spare bits after being used), thereby storing 256 bits of Data corresponding to the target Data as a whole, and thus the corresponding second item group includes 4 second items. And therefore, when inserting or deleting the target data, the pointer (start pointer or end pointer) of the address storage queue needs to be moved only by one entry, and the pointer (start pointer or end pointer) of the data storage queue needs to be moved by 4 entries.
The bit width of each entry in the second queue may be set smaller than the minimum bit width specified by the conventional instruction, for example, the bit width of the conventional instruction is 64bits, and the maximum bit width of the entries in the second queue may be set to 32bits, so that when the bit width of the data operated by the conventional instruction (i.e., the target instruction) is < = 32bits, only a single entry in the second queue is occupied, and the actually occupied resource is 32bits, which is reduced by the number of spare bits compared with the case that the maximum bit width of the entries in the second queue is set to, for example, greater 64bits, thereby having the effect of improving the resource utilization efficiency of the target queue.
In other embodiments of the present disclosure, such as for data storage in a CPU or other data storage device, a similar approach may be used in which the variable bit-width portion is decoupled from the fixed bit-width portion, reducing overall resource consumption.
FIG. 5 is a flow chart of store instruction execution based on a decoupled store queue provided in at least one embodiment of the present disclosure. As shown in fig. 5, the method comprises the following steps:
after the store instruction (an example of a target instruction in this disclosure) is fetched, the store instruction is decoded (step 500).
The number of tokens (Token) in the data store queue (example of the second queue in this disclosure) required for the store queue (STQ) (i.e., the above-described decoupled address store queue and data store queue together) by the data-dependent instruction portion of the store instruction (i.e., the STD microinstruction) is then calculated from the decoded store instruction (step 501). For example, from the bit width of the target data read by the store instruction and the bit width of each entry of the data store queue, it is possible to derive the number of second entries (e.g., set to M) needed to store the target data, i.e., the number of tokens.
After calculating the result, it is determined whether the number of free items available in the AGSQ (example of the third queue in the present disclosure), ALSQ (example of the fourth queue in the present disclosure), STQ (store queue, including address store queue (example of the first queue in the present disclosure) and data store queue decoupled from each other as described above) is sufficient, i.e., each of these queues needs to satisfy the execution condition of the aforementioned store instruction.
In the event that there are insufficient free entries available in either queue, the data caching process is suspended (step 504).
In the event that the number of free entries available in the overall queue is sufficient, the address-dependent instruction portion (STA microinstruction, an example of a first sub-instruction in this disclosure) and the data-dependent instruction portion (STD microinstruction, an example of a second sub-instruction in this disclosure) are written to AGSQ and ALSQ, respectively (step 505).
The STA waits for scheduling and executing (step 506), after the execution condition is met, the STA is scheduled to an execution part for executing, if the execution is completed, the target address is acquired (step 508), otherwise, the STA continues waiting for scheduling and executing; after the completion of the STA execution, the item occupied by the STA in the AGSQ is released, and the acquired target address is written into one item (first item) of the address storage queue (step 510).
The STD waits for scheduling and executing (step 507), the STD is scheduled to the execution part for executing, if the execution is completed, the target data corresponding to the target address is acquired (step 509), otherwise, the STD continues waiting for scheduling and executing; after the STD execution is completed, the items occupied by the STD in the ALSQ are released, and the target data is written into M items (second items) in the data storage queue ("width of target data/maximum width of single item in second queue"), the M second items constituting the second item group (step 511).
It is determined that both the address and the data of a certain item in the STQ have been obtained, i.e. it is determined that both the address information of the first item of the address storage queue and the data information of the second item of the data storage queue corresponding thereto have been written (step 512), and if so, a write cache or other hardware operation is performed on the target data according to the target address (step 513).
Thereafter, the commit bit (committed) corresponding to the above item of the STQ is set to "1" (step 514) to indicate that the entry is in a valid state and has committed for execution.
Finally, the corresponding entries occupied in the address store queue and the data store queue in the STQ are sequentially released (step 515), i.e., the first item and the corresponding second item group previously occupied are released, completing the execution of the instruction.
In the above embodiment, before acquiring the target instruction and acquiring the target data from the target instruction, the STA (i.e., the first sub-instruction) needs enough available entries of the AGSQ (i.e., the third queue), while the STD (i.e., the second sub-instruction) also needs enough available entries of the ALSQ (i.e., the fourth queue), further it is necessary to determine whether or not the available entries in the address storage queue and the data storage queue in the STQ for storing the address information and the data information acquired after the execution of the STA and the STD are sufficient, and acquire the determination result, and wait if the available entries required for either queue are insufficient. The available entries are free state entries where data writing is possible.
In the above embodiment, when the number requirement condition of available entries free in the first queue and the second queue is satisfied, the STA is written into the AGSQ, and the ID (i.e., identification information) of the entry where the address storage queue (InfoRegion) in which the address information obtained by the execution of the STA is to be stored is stored (EndPointer) may be stored in a certain entry of the AGSQ; writing the STD into the ALSQ, and saving, for the STD, in an entry of the ALSQ, an ID of an entry in which a current end pointer (EndPointer) of a data storage queue (DataRegion) in which data information obtained by its execution is to be stored is located, to be used later as flag information for writing target data and target addresses in the data storage queue and address storage queue, respectively.
It should be noted that, the target address and the target data are not limited to the corresponding methods in the disclosure, for example, there may be three methods as follows:
1. the same label (for example, digital ID) used for distributing the target address and the target data is obtained in the decoding or instruction decomposition stage, the label is added in the corresponding first item and second item when the target address and the target data are respectively stored in the first queue and the second queue, and therefore the corresponding relation between the first queue and the second queue can be searched according to the label later so as to obtain the corresponding mode between the target address and the target data.
2. After obtaining the written ID of the data storage queue (second queue) (i.e., the ID of the entry where the end pointer (EndPointer) of the data storage queue (DataRegion) is located) for the STD, the address storage queue (first queue) is notified of the ID and the ID is appended to the corresponding first entry, and then the specific location of the second queue can be obtained later according to the information in the first queue in combination with the width of the target data.
3. Because the address information and the data information are continuously stored in the respective storage queues, the address information in the initial state is in Entry0 of the first queue, the corresponding data information is in Entry0 of the second queue, the number of occupied entries is obtained by the width of the target data/the maximum bit width of a single item of the second queue, assuming that the data information is occupied in Entry 0-Entry 3, the address information of Entry1 of the first queue is in Entry4 of the corresponding second queue, and the specific range of the data information can be continuously obtained according to the width (DataSize) of the data recorded in Entry1, namely the accurate position of the corresponding data information in the data storage queue can be obtained according to the width (DataSize) of the data recorded in the Entry where the address information is located. For example, after the calculation process is performed once, the pointer of the position where the data information is located is stored in the first queue for the next search.
In the above embodiment, after the ID of the entry in which the end pointer (EndPointer) of the address storage queue is currently located is saved for the STA in the entry of the AGSQ, the entry pointed to by the end pointer is reserved in the address storage queue (first queue) for saving the destination address obtained after the STA is executed for writing the destination address later.
In the above embodiment, after writing the STD into the ALSQ and saving the ID of the entry where the end pointer (EndPointer) of the current address storage queue is currently located for the STD in the entry of the ALSQ, M entries starting from the entry pointed to by the end pointer are reserved in the data storage queue (second queue) to save the target data obtained after the STD is executed, for writing the target data.
In the above embodiment, at the time of releasing the entry of the STQ, the number of backward movements of the start pointer of the address storage queue and the start pointer of the data storage queue may be different, and the start pointer of the address storage queue is moved backward by 1 entry, and accordingly, the data storage queue needs to be moved backward by N (=datasize/entry width of the data storage queue) entries according to the width (DataSize) start pointer of the stored data information, and the number information of tokens of both may be updated at the same time if necessary. For example, in the case shown in fig. 4, when the entry of the STQ is released, the start pointer of the address storage queue is moved backward by 1 entry, that is, to the entry where the address information is "0x708", and accordingly, the start pointer of the Data storage queue needs to be moved backward by 4 (=256/64) entries, from the entry where Data16 is located to the entry where Data12 is located.
Correspondingly, at least one embodiment of the present disclosure further provides a data caching apparatus, and fig. 3B shows a schematic diagram of the data caching apparatus. Referring to fig. 3B, the data caching apparatus 10 includes a receiving module 101, a first writing module 102, and a second writing module 103.
The receiving module 101 is configured to receive target data obtained by executing a target instruction and a target address corresponding to the target data;
the first write module 102 is configured to write a target address into a first entry in a first queue;
the second writing module 103 is configured to write the target data into a second item group in a second queue independent of the first queue, wherein the second item group comprises at least one consecutive second item.
For example, the data caching apparatus provided by at least one embodiment of the present disclosure may further include a determining module configured to determine a first number of second items included in a second item group written to the second queue based on a width of the target data and a width of a single item in the second queue, and write the first number to the first item.
For example, the data caching apparatus provided in at least one embodiment of the present disclosure may further include a third writing module configured to write a start position corresponding to the second item group in the second queue into the first item.
For example, the first queue and the second queue are both annular queues, the start pointer of the first queue corresponds to the start pointer of the second queue, the end pointer of the first queue corresponds to the end pointer of the second queue, and the first item in the first queue corresponds to the second item group in the second queue one by one. Accordingly, the first queue includes a plurality of first items and a plurality of second item groups corresponding to the second queue, so that the plurality of first items in the first queue and the plurality of second item groups in the second queue are in one-to-one correspondence, for example, the second item groups corresponding to the first items can be located in the second queue through information recorded by each first item in the first queue.
For example, the data caching apparatus provided in at least one embodiment of the present disclosure may further include a decoding unit and a distributing unit, where the decoding unit is configured to decode the target instruction to obtain a first sub-instruction and a second sub-instruction before receiving the target data obtained by executing the target instruction and the target address corresponding to the target data, where the first sub-instruction may obtain the target address after executing the first sub-instruction, and the second sub-instruction may obtain the target data after executing the second sub-instruction; the dispatch unit is configured to determine whether the number of items in the first queue in the idle state and the number of items in the second queue in the idle state are sufficient to write the target address and the target data simultaneously, respectively, based on the first number.
For example, in the data caching apparatus provided in at least one embodiment of the present disclosure, before receiving target data obtained by executing a target instruction and a target address corresponding to the target data, the distributing unit is further configured to write, in a case where the number of items in an idle state in the first queue and the number of items in an idle state in the second queue are sufficient to be capable of writing the target address and the target data at the same time, the first sub-instruction and the second sub-instruction into a third queue for executing the first sub-instruction and a fourth queue for executing the second sub-instruction, respectively, to wait for subsequent execution; or further configured to abort processing of the first sub-instruction and the second sub-instruction if at least one of the number of items in the first queue and the number of items in the second queue is insufficient to enable simultaneous writing of the target address and the target data.
For example, the data caching apparatus provided in at least one embodiment of the present disclosure may further include a first saving module and a second saving module, where the first saving module is configured to save, before receiving target data obtained by executing the target instruction and a target address corresponding to the target data, flag information of an item corresponding to an end pointer of the first sub-instruction in the first queue after the first sub-instruction is written in the third queue; the second saving module is configured to save flag information of an item corresponding to an end pointer of the second sub-instruction in the second queue in the fourth queue after writing the second sub-instruction in the fourth queue.
For example, the data caching apparatus provided in at least one embodiment of the present disclosure may further include a first reservation unit configured to reserve, after the flag information of the item corresponding to the end pointer of the first sub-instruction in the first queue is stored in the third queue, a position of the first item for writing the target address by the end pointer of the first queue in the first queue.
For example, the data caching apparatus provided in at least one embodiment of the present disclosure may further include a second reservation unit configured to reserve, in the second queue, a second item group including a first number of second items for writing target data corresponding to the target address through an end pointer of the second queue after saving flag information of an item corresponding to the end pointer of the second sub-instruction in the second queue in the fourth queue.
For example, the data caching apparatus provided in at least one embodiment of the present disclosure may further include an execution module configured to execute the first sub-instruction to obtain target data, and execute the second sub-instruction to obtain the target address after writing the first sub-instruction and the second sub-instruction into the third queue and the fourth queue, respectively. For example, the execution module includes various execution units such as an arithmetic logic unit (ALS), a read storage unit (LSU), an Address Generation Unit (AGU), and the like.
For example, in the data caching apparatus provided in at least one embodiment of the present disclosure, the first writing module is further configured to write the target address into a first entry reserved in the first queue based on the flag information corresponding to the first queue.
For example, in the data caching apparatus provided in at least one embodiment of the present disclosure, the second writing module is further configured to write the target data into the second item group reserved in the second queue based on the flag information corresponding to the second queue.
For example, the data caching apparatus provided in at least one embodiment of the present disclosure may further include a state change module configured to set a commit bit, which is provided externally to the target address and the target data and corresponds to the first item, to a valid state after writing the target address into the first item and writing the target data into the second item group.
For example, the data caching apparatus provided in at least one embodiment of the present disclosure may further include a release module configured to release the first entry in the first queue and the second entry group in the second queue, respectively, after the target address and the target data are provided outside, and set the commit bit corresponding to the first entry to an invalid state.
For example, the data caching apparatus provided in at least one embodiment of the present disclosure may further include a first reclamation module and a second reclamation module, where the first reclamation module is configured to reclaim the first item in the first queue through a start finger of the first queue after setting a commit bit corresponding to the first item to an invalid state; the second reclamation module is configured to reclaim a second item included in the second item group in the second queue by a start finger of the second queue.
At least one embodiment of the present disclosure further provides a data caching device, where the data caching device includes a processing unit and a memory, and executable instructions are stored on the memory; wherein the executable instructions, when executed by the processing unit, implement the data caching method as described above.
At least one embodiment of the present disclosure further provides a non-transitory readable storage medium having stored thereon computer instructions, wherein the computer instructions, when executed by a processor, implement a data caching method as in any one of the embodiments above.
For example, at least one embodiment of the present disclosure provides a processor including the data caching apparatus of any one of the embodiments described above.
For example, the processor of at least one embodiment of the present disclosure may further include, for example, a fetch unit, other types of execution units (e.g., a multiplication unit, etc.), a register rename unit, etc., and may further include various levels of caches (e.g., an L1 cache, an L2 cache), a branch prediction unit, etc., which are not described in detail herein.
Embodiments of the present disclosure are not limited in the type of instruction set or microarchitecture employed by the processor, e.g., CISC microarchitectures or RISC microarchitectures may be employed, e.g., X86-type microarchitectures, ARM-type microarchitectures, RISC-V-type microarchitectures, etc.
At least some embodiments of the present disclosure also provide an electronic device comprising a processor of any one of the embodiments described above. Fig. 6 is a schematic block diagram of an electronic device provided in at least one embodiment of the present disclosure.
The electronic devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and stationary terminals such as digital TVs, desktop computers, and the like. The electronic device 1000 shown in fig. 6 is merely an example and should not be construed as limiting the functionality and scope of use of the disclosed embodiments.
For example, as shown in fig. 6, in some examples, an electronic device 1000 includes a processing device (e.g., central processing unit, graphics processor, etc.) 1001, which may include a processor of any of the above embodiments, which may perform various suitable actions and processes in accordance with a program stored in a Read Only Memory (ROM) 1002 or a program loaded from a storage device 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data required for the operation of the computer system are also stored. The processor 1001, ROM 1002, and RAM 1003 are connected thereto by a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.
For example, the following components may be connected to the I/O interface 1005: input devices 1006 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, and the like; an output device 1007 including a Liquid Crystal Display (LCD), speaker, vibrator, etc.; storage 1008 including, for example, magnetic tape, hard disk, etc.; for example, communication means 1009 may also include a network interface card such as a LAN card, modem, etc. The communication device 1009 may allow the electronic device 1000 to perform wireless or wired communication with other apparatuses to exchange data, performing communication processing via a network such as the internet. The drive 1010 is also connected to the I/O interface 1005 as needed. Removable media 1011, such as a magnetic disk, optical disk, magneto-optical disk, semiconductor memory, or the like, is mounted on drive 1010 as needed so that a computer program read therefrom is mounted to storage 1008 as needed. While fig. 6 illustrates an electronic device 1000 that includes various devices, it should be understood that not all illustrated devices are required to be implemented or included. More or fewer devices may be implemented or included instead.
For example, the electronic device 1000 may further include a peripheral interface (not shown), and the like. The peripheral interface may be various types of interfaces, such as a USB interface, a lightning (lighting) interface, etc. The communication means 1009 may communicate with a network, such as the internet, an intranet, and/or a wireless network, such as a cellular telephone network, a wireless Local Area Network (LAN), and/or a Metropolitan Area Network (MAN), and other devices via wireless communication. The wireless communication may use any of a variety of communication standards, protocols, and technologies including, but not limited to, global System for Mobile communications (GSM), enhanced Data GSM Environment (EDGE), wideband code division multiple Access (W-CDMA), code Division Multiple Access (CDMA), time Division Multiple Access (TDMA), bluetooth, wi-Fi (e.g., based on the IEEE 802.11a, IEEE 802.11b, IEEE 802.11g, and/or IEEE 802.11n standards), voice over Internet protocol (VoIP), wi-MAX, protocols for email, instant messaging, and/or Short Message Service (SMS), or any other suitable communication protocol.
For example, the electronic device 1000 may be any device such as a mobile phone, a tablet computer, a notebook computer, an electronic book, a game console, a television, a digital photo frame, a navigator, or any combination of a data processing device and hardware, which is not limited in the embodiments of the present disclosure.
While the disclosure has been described in detail with respect to the general description and the specific embodiments thereof, it will be apparent to those skilled in the art that certain modifications and improvements may be made thereto based on the embodiments of the disclosure. Accordingly, such modifications or improvements may be made without departing from the spirit of the disclosure and are intended to be within the scope of the disclosure as claimed.
For the purposes of this disclosure, the following points are also noted:
(1) The drawings of the embodiments of the present disclosure relate only to the structures related to the embodiments of the present disclosure, and other structures may refer to the general design.
(2) In the drawings for describing embodiments of the present disclosure, the thickness of layers or regions is exaggerated or reduced for clarity, i.e., the drawings are not drawn to actual scale.
(3) The embodiments of the present disclosure and features in the embodiments may be combined with each other to arrive at a new embodiment without conflict.
The foregoing is merely specific embodiments of the disclosure, but the scope of the disclosure is not limited thereto, and the scope of the disclosure should be determined by the claims.
Claims (18)
1. A data caching method, comprising:
Receiving target data obtained by executing a target instruction and a target address corresponding to the target data;
writing the target address into a first entry in a first queue;
the target data is written into a second item group in a second queue independent from the first queue, wherein the second item group comprises at least one continuous second item.
2. The data caching method of claim 1, further comprising:
based on the width of the target data and the width of the individual items in the second queue, a first number of second items included in a second item group writing the target data into the second queue is determined, and the first number is written into the first item.
3. The data caching method of claim 2, further comprising:
and writing a starting position corresponding to the second item group in the second queue into the first item.
4. The data caching method according to claim 2, wherein the first queue and the second queue are both ring-shaped queues, a start pointer of the first queue and a start pointer of the second queue correspond to each other, an end pointer of the first queue and an end pointer of the second queue correspond to each other, and a first item in the first queue and a second item group in the second queue correspond to each other.
5. The data caching method according to claim 3 or 4, further comprising, before the receiving the target data obtained by executing the target instruction and the target address corresponding to the target data:
decomposing the target instruction to obtain a first sub-instruction and a second sub-instruction, wherein the target address can be obtained after the first sub-instruction is executed, and the target data can be obtained after the second sub-instruction is executed;
and respectively judging whether the number of the items in the idle state of the first queue and the number of the items in the idle state of the second queue are sufficient to simultaneously write the target address and the target data based on the first number.
6. The data caching method according to claim 5, further comprising, before receiving the target data obtained by executing the target instruction and the target address corresponding to the target data:
writing the first sub-instruction and the second sub-instruction into a third queue for executing the first sub-instruction and a fourth queue for executing the second sub-instruction, respectively, to wait for subsequent execution, if the number of items in the first queue in the idle state and the number of items in the second queue in the idle state are sufficient to be able to write the target address and the target data simultaneously; or,
In the event that at least one of the number of items in the first queue that are in the idle state and the number of items in the second queue that are in the idle state is insufficient to enable simultaneous writing of the target address and the target data, processing of the first sub-instruction and the second sub-instruction is aborted.
7. The data caching method according to claim 6, further comprising, before the receiving the target data obtained by executing the target instruction and the target address corresponding to the target data:
after the first sub-instruction is written into the third queue, the mark information of the item corresponding to the ending pointer of the first sub-instruction in the first queue is stored in the third queue;
after the second sub-instruction is written into the fourth queue, the mark information of the item corresponding to the end pointer of the second sub-instruction in the second queue is stored in the fourth queue.
8. The data caching method according to claim 7, further comprising, after saving flag information of an item corresponding to an end pointer of the first sub-instruction in the first queue in a third queue:
The position of the first item is reserved for writing the target address in the first queue through an end pointer of the first queue.
9. The data caching method according to claim 7, further comprising, after saving flag information of an item corresponding to an end pointer of the second sub-instruction in the second queue in a fourth queue:
and reserving a second item group comprising the first number of second items in the second queue through an end pointer of the second queue for writing target data corresponding to the target address.
10. The data caching method of claim 7, after writing the first sub-instruction and the second sub-instruction into a third queue and a fourth queue, respectively, further comprising:
and executing the first sub-instruction to obtain the target data, and executing the second sub-instruction to obtain the target address.
11. The data caching method of claim 8, wherein the writing the target address into the first entry of the first queue comprises:
and writing the target address into the first item reserved in the first queue based on the mark information corresponding to the first queue.
12. The data caching method of claim 9, wherein the writing the target data into a second set of entries in a second queue independent of the first queue comprises:
and writing the target data into the second item group reserved in the second queue based on the mark information corresponding to the second queue.
13. The data caching method of any one of claims 1-9, further comprising:
after the target address is written in the first item and the target data is written in the second item group, the target address and the target data are externally provided and a commit bit corresponding to the first item is set to a valid state.
14. The data caching method of claim 13, further comprising, after the target address and the target data are provided out-of-pair:
and respectively releasing the first item in the first queue and the second item group in the second queue, and setting the commit bit corresponding to the first item to be in an invalid state.
15. The data caching method of claim 14, further comprising, after setting the commit bit corresponding to the first item to an invalid state:
reclaiming the first item in the first queue by a start finger of the first queue;
Reclamation in the second queue by the start of the second queue refers to reclamation for a second item included in the second item group.
16. A data caching apparatus, comprising:
the receiving module is configured to receive target data obtained by executing a target instruction and a target address corresponding to the target data;
a first write module configured to write the target address into a first entry in a first queue;
a second writing module configured to write the target data into a second item group in a second queue independent of the first queue;
wherein the second set of items comprises at least one consecutive second item.
17. A data caching apparatus, comprising:
a processing unit;
a memory having executable instructions stored thereon;
wherein the executable instructions, when executed by the processing unit, implement the data caching method of any one of claims 1-15.
18. A processor comprising a data caching apparatus according to claim 16 or 17.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311312225.6A CN117348934A (en) | 2023-10-10 | 2023-10-10 | Data caching method, data caching device and processor |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311312225.6A CN117348934A (en) | 2023-10-10 | 2023-10-10 | Data caching method, data caching device and processor |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117348934A true CN117348934A (en) | 2024-01-05 |
Family
ID=89360680
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311312225.6A Pending CN117348934A (en) | 2023-10-10 | 2023-10-10 | Data caching method, data caching device and processor |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117348934A (en) |
-
2023
- 2023-10-10 CN CN202311312225.6A patent/CN117348934A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
TWI654562B (en) | Backtracking compatibility by algorithm matching, deactivating features, or limiting performance | |
US20130120411A1 (en) | Asynchronous notifications for concurrent graphics operations | |
US20080303837A1 (en) | Batching graphics operations with time stamp tracking | |
US8085280B2 (en) | Asymmetric two-pass graphics scaling | |
US7921274B2 (en) | Computer memory addressing mode employing memory segmenting and masking | |
CN114356420B (en) | Instruction pipeline processing method and device, electronic device and storage medium | |
EP3265909A1 (en) | Register renaming in multi-core block-based instruction set architecture | |
US10684859B2 (en) | Providing memory dependence prediction in block-atomic dataflow architectures | |
US20120204005A1 (en) | Processor with a Coprocessor having Early Access to Not-Yet Issued Instructions | |
CN114675890B (en) | Instruction execution method, device, equipment and storage medium | |
US20130027415A1 (en) | Deferred deletion and cleanup for graphics resources | |
JP2016535887A (en) | Efficient hardware dispatch of concurrent functions in a multi-core processor, and associated processor system, method, and computer-readable medium | |
CN107567614B (en) | Multicore processor for execution of strands of instructions grouped according to criticality | |
WO2021061269A1 (en) | Storage control apparatus, processing apparatus, computer system, and storage control method | |
EP2856304B1 (en) | Issuing instructions to execution pipelines based on register-associated preferences, and related instruction processing circuits, processor systems, methods, and computer-readable media | |
CN118245186A (en) | Cache management method, cache management device, processor and electronic device | |
CN118245218A (en) | Cache management method, cache management device, processor and electronic device | |
CN117348934A (en) | Data caching method, data caching device and processor | |
US20200356372A1 (en) | Early instruction execution with value prediction and local register file | |
EP4078361B1 (en) | Renaming for hardware micro-fused memory operations | |
TW201913364A (en) | Caching instruction block header data in block architecture processor-based systems | |
WO2022140043A1 (en) | Condensed command packet for high throughput and low overhead kernel launch | |
US20140201505A1 (en) | Prediction-based thread selection in a multithreading processor | |
CN117170750B (en) | Multi-source operand instruction scheduling method, device, processor, equipment and medium | |
US11989582B2 (en) | Apparatus and method for low-latency decompression acceleration via a single job descriptor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |