CN118796272A

CN118796272A - Access method, processor, electronic device and readable storage medium

Info

Publication number: CN118796272A
Application number: CN202411287927.8A
Authority: CN
Inventors: 林明锋; 朱航; 张军明; 何伟; 唐丹; 包云岗
Original assignee: Beijing Open Source Chip Research Institute
Current assignee: Beijing Open Source Chip Research Institute
Priority date: 2024-09-13
Filing date: 2024-09-13
Publication date: 2024-10-18
Anticipated expiration: 2044-09-13

Abstract

The embodiment of the invention provides a memory access method, a processor, electronic equipment and a readable storage medium, relating to the technical field of computers, according to the embodiment of the invention, a reservation station splits a target vector instruction to obtain a first vector element, and sends a first reading request corresponding to the first vector element to a cache unit; the caching unit returns a target index value corresponding to the first vector element to the reservation station according to the source register number carried by the first reading request; the buffer unit merges requests carrying the same source register number in the first read request to obtain a second read request, and acquires a first register value corresponding to the second read request from the register file; and under the condition that the execution condition of the first vector element is met, the reservation station acquires a second register value corresponding to the first vector element from the cache unit according to the target index value. The embodiment of the invention reduces the read port pressure of the register file and improves the processing frequency of the processor.

Description

Access method, processor, electronic device and readable storage medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a memory access method, a processor, an electronic device, and a readable storage medium.

Background

The vector extended instruction set (RISC-V Vector Extension, RVV) of RISC-V is an important extension of the RISC-V architecture, aimed at improving the performance of processors on vector and parallel computing tasks. RVV allows developers to dynamically adjust the scale of vector operations during processor operation according to specific application requirements by introducing flexible vector length design, which greatly enhances the flexibility and efficiency of RISC-V processors in processing vector data of different sizes and complexities.

In RVV, a Vector instruction may be split into multiple Vector elements for execution, e.g., 128 Vector elements for execution where the total Length (Vector Length, VLEN) of a single Vector register is 128 bits (bits), the number of "logical" lanes (Lane Multiplication Factor, LMUL) in a Vector register is 8, and the nominal width (SELECT ELEMENT WIDTH, SEW) of a single Vector element is 8 bits.

However, each vector element needs to read the register file (e.g., the integer register file and the vector register file) during execution, especially in a scenario where vector elements are executed in high concurrence, resulting in an increase in the read port pressure of the register file, which reduces the processing frequency of the processor.

Disclosure of Invention

The embodiment of the invention provides a memory access method, a processor, electronic equipment and a readable storage medium, which can solve the problems that the read port pressure of a register file is increased and the processing frequency of a processor is reduced in the process of executing vector instructions by a processor in the related technology.

In order to solve the above problems, an embodiment of the present invention discloses a memory access method, which is applied to a processor, wherein the processor includes a register file, a reservation station, and a cache unit disposed in the reservation station; the method comprises the following steps:

The reservation station splits a target vector instruction to obtain at least two first vector elements, and sends a first reading request corresponding to the first vector elements to the cache unit; the first read request carries a source register number of the first vector element;

the caching unit receives the first reading request and returns a target index value corresponding to the first vector element to the reservation station according to a source register number carried by the first reading request;

the buffer unit merges requests carrying the same source register number in the received first read request to obtain at least one second read request;

The cache unit acquires a first register value corresponding to the second read request from the register file;

and under the condition that the execution condition of the first vector element is met, the reservation station acquires a second register value corresponding to the first vector element from the cache unit according to the target index value.

In another aspect, an embodiment of the invention discloses a processor, which includes a register file, a reservation station, and a cache unit disposed in the reservation station;

The reservation station is used for splitting a target vector instruction to obtain at least two first vector elements and sending a first reading request corresponding to the first vector elements to the cache unit; the first read request carries a source register number of the first vector element;

The caching unit is configured to receive the first read request, and return, to the reservation station, a target index value corresponding to the first vector element according to a source register number carried by the first read request; combining the received requests carrying the same source register number in the first read request to obtain at least one second read request; and obtaining a first register value corresponding to the second read request from the register file;

the reservation station is further configured to, when the execution condition of the first vector element is satisfied, obtain, from the cache unit, a second register value corresponding to the first vector element according to the target index value.

In still another aspect, the embodiment of the invention also discloses an electronic device, which comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete communication with each other through the communication bus; the memory is used for storing executable instructions, and the executable instructions enable the processor to execute the memory access method.

The embodiment of the invention also discloses a readable storage medium, which enables the electronic equipment to execute the memory access method when the instructions in the readable storage medium are executed by the processor of the electronic equipment.

The embodiment of the invention has the following advantages:

The embodiment of the invention provides a memory access method, which comprises the steps that under the condition that a reservation station needs to access a register file to acquire a register value corresponding to a target vector instruction, the target vector instruction is split into at least two first vector elements, and then a first reading request corresponding to the first vector elements is sent to a cache unit arranged in the reservation station; the buffer unit returns a target index value corresponding to the first vector element to the reservation station according to the source register number carried by the first read request, merges the requests carrying the same source register number in the received first read request, accesses the register file based on the merged second read request to acquire a first register value corresponding to the second read request, avoids repeated access to the register file, reduces the times of accessing the register file, and accordingly reduces the read port pressure of the register file; under the condition that the reservation station meets the execution condition of the first vector element, the second register value corresponding to the first vector element can be directly obtained from the cache unit according to the target index value corresponding to the first vector element, the register file is not required to be accessed again, the efficiency of reading the register value from the register file in the parallel execution process of the first vector element is improved, and the processing frequency of the processor is further improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments of the present invention will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of steps of an embodiment of a memory access method of the present invention;

FIG. 2 is a logical block diagram of a processor of the present invention;

FIG. 3 is a block diagram of another processor of the present invention;

fig. 4 is a block diagram of an electronic device for access provided by an example of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The terms first, second and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged, as appropriate, such that embodiments of the present invention may be implemented in sequences other than those illustrated or described herein, and that the objects identified by "first," "second," etc. are generally of a type, and are not limited to the number of objects, such as the first object may be one or more. Furthermore, the term "and/or" as used in the specification and claims to describe an association of associated objects means that there may be three relationships, e.g., a and/or B, may mean: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship. The term "plurality" in embodiments of the present invention means two or more, and other adjectives are similar.

Method embodiment

Referring to fig. 1, a step flow diagram of an embodiment of a memory access method of the present invention is shown, where the method specifically may include the following steps:

Step 101, a reservation station splits a target vector instruction to obtain at least two first vector elements, and sends a first reading request corresponding to the first vector elements to a cache unit; the first read request carries the source register number of the first vector element.

Step 102, the cache unit receives the first read request, and returns a target index value corresponding to the first vector element to the reservation station according to the source register number carried by the first read request.

Step 103, the buffer unit merges the received first read requests carrying the same source register number to obtain at least one second read request.

Step 104, the buffer unit acquires a first register value corresponding to the second read request from the register file.

Step 105, the reservation station obtains a second register value corresponding to the first vector element from the cache unit according to the target index value when the execution condition of the first vector element is satisfied.

The memory access method provided by the embodiment of the invention can be applied to a processor, and the processor can be a vector processor based on an RVV instruction set. The vector Processor (also called as an array Processor) is a central processing unit (Central Processing Unit/Processor, CPU) that directly operates a one-dimensional array (vector) instruction set, and the vector Processor can synchronously perform operation of integrated data, so that performance is greatly improved in a specific working environment, especially in the fields of numerical simulation or other fields requiring processing of a large amount of similar data.

Referring to FIG. 2, a logical block diagram of a processor of the present invention is shown, which in an embodiment of the present invention includes a register file 20, a reservation station 10, and a cache unit 11. The register file (REGISTER FILE), also called a register file, is a key hardware component in a computer processor, and is mainly composed of an array composed of a plurality of registers, and the array is usually implemented by a fast static random access memory (Static Random Access Memory, SRAM), and has special read ports (read ports) and write ports (write ports), so that different registers can be accessed in multiple ways and concurrently. The memory access method provided by the embodiment of the invention is mainly used for reducing the pressure of the register file reading port under the condition of concurrent execution of the first vector element.

The reservation station (Reservation Station, also called an issue queue or a reorder buffer) is a special structure in the processor, and is used for temporarily storing the instruction waiting for execution and the operand thereof, and the reservation station realizes out-of-order execution and parallel processing of the instruction by technical means such as register renaming, and improves the performance of the processor.

The buffer unit (regCache) is a functional unit arranged in the reservation station, and is used for receiving a read request sent by the reservation station, reading a register value corresponding to the read request from the register file through a read port of the register file based on the read request sent by the reservation station, and storing the read register value so that the reservation station can directly acquire a target register value corresponding to a target instruction from the register value stored in the buffer unit under the condition that the target instruction needs to be transmitted.

Specifically, the instruction to be executed in the processor is sent to the reservation station, and after the reservation station receives the target vector instruction, before the target vector instruction is executed, the target vector instruction is split into at least two first vector elements through step 101; it will be appreciated that concatenating the individual first vector elements may result in a complete target vector instruction.

The target vector instruction is any vector instruction in vector instructions which the processor needs to execute.

In the embodiment of the invention, the reservation station can split the target vector instruction according to the length of a single vector source register corresponding to the target vector instruction, the number of logic channels of the vector source register and the nominal width of vector elements in the vector source register to obtain at least two first vector elements.

Specifically, the number of first vector elements may be calculated by the following formula:

（1）

Wherein, Representing the number of first vector elements; representing the length of a single vector source register corresponding to the target vector instruction; Representing the number of "logical" channels of the vector source register; representing the nominal width of the vector elements in the vector source register.

Illustratively, in the case where the length of the single vector source register corresponding to the target vector instruction is 128 bits, the number of "logical" lanes of the vector source register is 8, and the nominal width of the vector elements in the vector source register is 8 bits, the number of first vector elements obtained in step 101 is 128.

It will be appreciated that the respective first vector elements may be executed concurrently, and that the reservation station needs to obtain the register values required to execute the respective first vector elements prior to concurrent execution. In an embodiment of the present invention, after the first vector elements are obtained in step 101, the reservation station may generate, for each first vector element, a first read request corresponding to the first vector element, and send the first read request corresponding to each first vector element to the cache unit.

The number of the first read requests is the same as the number of the first vector elements, the first read requests carry source register numbers of the first vector elements, and second register values corresponding to the first vector elements are stored in source registers corresponding to the source register numbers.

It should be noted that, the source register corresponding to the source register number is a register in the register file, and the reservation station may access the source register corresponding to the source register number through a read port of the register file, and obtain a register value from the source register; the source registers corresponding to the source register numbers may include, but are not limited to, vector source registers in a register file, integer source registers, and the like.

In the embodiment of the present invention, when the cache unit receives the first read request, the index value corresponding to the source register number may be determined locally from the cache unit according to the source register number carried by the first read request; then, the index value corresponding to the source register number is determined as the target index value corresponding to the first vector element, and the target index value is returned to the reservation station.

It should be noted that, before step 102, the local cache area of the cache unit may be in a blank state or a non-blank state.

Specifically, the fact that the local cache area of the cache unit is in a non-blank state means that, before step 102, the cache unit has already stored locally at least one index value corresponding to the register number and a register value corresponding to the register number; in the local cache area of the cache unit, the register number, the index value and the register value can be stored according to a preset corresponding relation, so that the cache unit can determine the index value corresponding to the register number according to the register number and determine the register value corresponding to the register number according to the index value.

The register numbers are numbers of registers in the register file, and the register values corresponding to the register numbers are register data stored in the registers corresponding to the register numbers. It will be appreciated that the registers in the register file comprise the source registers corresponding to the target vector instruction, and that the register numbers comprise the source register numbers of the first vector elements. It should be noted that, the register number locally stored in the cache unit is the number of at least one register in the register file, and before step 102, the register number stored in the cache unit may or may not include the source register number of the first vector element.

The storage form of the register number, the index value and the register value stored according to the preset corresponding relation may specifically include, but is not limited to: linked lists, arrays, hash tables, etc.

The buffer area local to the buffer unit being blank means that the buffer unit does not locally store the register number, index value, and register value until step 102.

In the embodiment of the invention, under the condition that a first reading request sent by a reservation station is received, if a local caching area is in a blank state, an index value corresponding to a source register number carried by the first reading request is created in the caching area, and the index value corresponding to the source register number carried by the created first reading request is determined as a target index value; for example, if the local cache area is in a blank state, the index value corresponding to the source register number carried by the first read request created in the cache area may be id0.

If the local cache area is in a non-blank state, the cache unit may first search an index value corresponding to a source register number carried by the first read request from the local cache area according to the source register number; under the condition that an index value corresponding to the source register number exists in a local cache area, determining the index value corresponding to the source register number as a target index value corresponding to the first vector element, and returning the target index value to the reservation station; and under the condition that the index value corresponding to the source register number does not exist in the local cache area, creating the index value corresponding to the source register number carried by the first reading request in the local cache area as a target index value corresponding to the first vector element, and sending the target index value to the reservation station. When the local cache area is in a non-blank state, the index value created in the local cache area is an index value different from the index value already stored locally; for example, the index values in the cache unit are created in order from small to large, and the index values stored in the current cache area include id0, id1, and id2, then the index value corresponding to the source register number carried by the first read request that the cache unit continues to create in the local cache area may be id3.

In the embodiment of the present invention, when the buffer unit receives the first read request, step 103 further merges the received requests with the same source register number in the first read request according to the source register number carried in the first read request, so as to obtain at least one second read request.

It should be noted that, merging the requests carrying the same source register number in the first read request may specifically be: combining at least two first read requests carrying the same source register number, reserving any one of the at least two first read requests carrying the same source register number, and determining the reserved first read request as a second read request; in addition, in the case that the first read request received by the buffer unit includes the first read request carrying the source register numbers different from each other, the buffer unit may determine the first read request carrying the source register numbers different from each other as the second read request, so that the source register numbers carried by the second read request are different from each other in step 103, thereby avoiding repeated access of the buffer unit to the register file.

It is understood that the second read request includes: the request obtained by merging the requests carrying the same source register number in the first read request and/or the first read request carrying different source register numbers (i.e. the first read request which does not need to be merged).

In an alternative embodiment, the buffering unit may perform step 103 in case the number of received first read requests reaches a first preset threshold; illustratively, the first preset threshold may be 5, 10, 20, etc.

In another alternative embodiment, the buffer unit may perform step 103 according to a preset time period, specifically, in a case where a time period between a first time of performing step 103 and a current time reaches the preset time period, the buffer unit performs step 103; illustratively, the preset duration may be 1s, 5s, 30s, etc.

In the embodiment of the present invention, when the second read request is obtained in step 103, the buffer unit may read a register value from a register corresponding to the source register number in the register file according to the source register number carried in the second read request, and determine the read register value as a first register value corresponding to the second read request; and storing the first register value in a local cache area of the cache unit according to the corresponding relation between the source register number carried by the pre-second read request and the target index value returned to the reservation station in step 102.

It should be noted that, the first register value corresponding to the second read request acquired from the register file by the buffer unit is all data stored in the register corresponding to the source register number carried by the second read request.

In the embodiment of the present invention, when the reservation station satisfies the execution condition of the first vector element, according to the target index value, the second register value corresponding to the first vector element is obtained from the cache unit, which may specifically be: under the condition that the execution condition of the first vector element is met, sending a data acquisition request to a cache unit, wherein the data acquisition request carries a target index value corresponding to the first vector element; the caching unit searches a register value corresponding to the target index value in a local caching area according to the target index value in the data acquisition request, determines the register value corresponding to the target index value as a second register value, and sends the second register value to the reservation station; the reservation station, upon obtaining the second register value, may send the first vector element to a functional unit for executing the first vector element for the functional unit to execute the first vector element based on the second register value.

Wherein the execution condition of the first vector element is a condition for transmitting the first vector element from the reservation station for execution, the execution condition of the first vector element in the embodiment of the present invention may include, but is not limited to: the functional unit in the processor for executing the first vector element is in an available state, dependent instruction execution of the first vector element is complete, etc.

It should be noted that the second register value corresponding to the first vector element may be the same register value as the first register obtained in step 104, or may be a different register value from the first register obtained in step 104, and the second register value may also be a data segment of the first register obtained in step 104.

In the case of the data segment with the second register value being the first register value, the reservation station may acquire, before step 105, a configuration parameter of the first vector element from a configuration register corresponding to the target vector instruction, where the configuration parameter includes an address offset of the second register value corresponding to the first vector element in the first register value. Under the condition that the reservation station meets the execution condition of the first vector element, in the process of acquiring the second register value corresponding to the first vector element from the cache unit according to the target index value, the data acquisition request sent to the cache unit also carries the address offset of the second register value corresponding to the first vector element in the first register value; under the condition that a data acquisition request sent by a reservation station is received, the caching unit searches a first register value corresponding to the target index value in a local caching area according to the target index value in the data acquisition request, then determines a second register value corresponding to the first vector element from the first register value corresponding to the target index value according to address offset in the data acquisition request, and sends the second register value to the reservation station.

In an embodiment of the present invention, the target vector instruction may be a memory vector instruction, and the reservation station may split the memory vector instruction into at least two scalar memory instructions (first vector elements) through step 101. Each scalar memory access instruction is denoted "ld src imm"; where "ld" (load) indicates that the scalar memory access instruction is a load instruction; "src" represents the source register number corresponding to the scalar memory access instruction; "imm" means the immediate corresponding to the scalar memory instruction.

Illustratively, the scalar memory access instruction is "ld 3 100"; wherein "3" indicates that the source register number corresponding to the scalar access instruction is "3", that is, the register value required for executing the scalar access instruction needs to be read from the source register with the register number of "3". If the reservation station acquires the second register value ("1") corresponding to the scalar access instruction through step 105, the second register value "1" may be added with "100" to be used as the final access address of the scalar access instruction; it will be appreciated that the reservation station may issue a scalar memory instruction to the functional unit to which the scalar memory instruction corresponds for execution in the event that the final memory address of the scalar memory instruction is determined.

In addition, the vector memory access instruction further includes a memory access format, for example, for a stride type memory access instruction, the memory access format includes a destination register number and a source register number of the vector memory access instruction, and a configuration parameter corresponding to the vector memory access instruction written in the configuration register, where the configuration parameter indicates an address offset of each time the vector memory access instruction accesses the register file, and a number of times the vector memory access instruction needs to access the register file.

Illustratively, in the case where the address offset is 100 and the number of accesses is 4, then the reservation station may split one vector memory access instruction into 4 scalar memory access instructions as follows:

Scalar access instruction 1: src+100;

scalar access instruction 2: src+200;

scalar access instruction 3: src+300;

scalar access instruction 4: src+400.

Since the source register numbers of the 4 scalar access instructions are all "Src"; in step 101, the source register numbers carried in the first read request 1 corresponding to the scalar access instruction 1, the first read request 2 corresponding to the scalar access instruction 2, the first read request 3 corresponding to the scalar access instruction 3, and the first read request 4 corresponding to the scalar access instruction 4, which are sent by the reservation station to the cache unit, are all "Src".

In step 102, the target index values corresponding to the scalar access instructions returned by the cache unit to the reservation station are the same, and are all idx0:

Scalar access instruction 1: src+100→idx0;

scalar access instruction 2: src+200→idx0;

scalar access instruction 3: src+300→idx0;

scalar access instruction 4: src+400→idx0.

In step 103, the buffer unit may combine the first read request 1, the first read request 2, the first read request 3, and the first read request 4 to obtain 1 second read request, where the source register number carried by the second read request is "Src"; in step 104, the buffer unit only needs to access the register file once to obtain the first register values corresponding to the 4 first read requests, thereby reducing the times of accessing the register file; in step 105, the reservation station may obtain, from the first register values stored in the cache unit, second register values corresponding to each scalar access instruction according to the target index value and the address offset of the scalar access instruction.

As another example, where a reservation station splits a vector memory instruction into 4 scalar memory instructions:

scalar access instruction 1: src0+100;

scalar access instruction 2: src0+100;

Scalar access instruction 3: src1+100;

scalar access instruction 4: sr1+100.

Because the source register numbers of the 4 scalar access instructions are "Src0" and "Src1", respectively, in step 101, the source register numbers carried in the first read request 1 corresponding to the scalar access instruction 1 and the first read request 2 corresponding to the scalar access instruction 2 sent by the reservation station to the cache unit are both "Src0", and the source register numbers carried in the first read request 3 corresponding to the scalar access instruction 3 and the first read request 4 corresponding to the scalar access instruction 4 sent by the reservation station to the cache unit are both "Src1"; in step 102, the target index values corresponding to the scalar access instructions returned by the cache unit to the reservation station are as follows:

scalar access instruction 1: src0+100→idx0;

scalar access instruction 2: src0+100→idx0;

scalar access instruction 3: sr1+100→idx1;

scalar access instruction 4: sr1+100→idx1.

In step 103, the cache unit may combine the first read request 1 and the first read request 2 to obtain a second read request 1, where the source register number carried by the second read request 1 is "Src 0"; combining the first reading request 3 and the first reading request 4 to obtain a second reading request 2, wherein the source register number carried by the second reading request 2 is Src 1; in step 104, the buffer unit only needs to access the register file twice to obtain the first register values corresponding to the 4 first read requests, thereby reducing the number of times of accessing the register file.

It will be appreciated that the storage capacity of the cache unit is fixed, and that the cache unit may store at least one first register value locally; the caching unit may first determine whether the local cache area is full after each of the first register values acquired in step 104 and before storing the acquired first register values.

When the local cache area of the cache unit is not full, the cache unit may store the first register value acquired in step 104 in the local cache area of the cache unit according to the correspondence between the source register number and the target index value carried by the second read request.

Under the condition that the local cache area of the cache unit is full, the cache unit firstly releases at least one first register value stored locally according to a preset cache release rule, and then stores the first register value acquired through the step 104 in the local cache area of the cache unit according to the corresponding relation between the source register number and the target index value carried by the second read request.

The preset cache release rule may include, but is not limited to: the register value with the longest storage time locally in the cache unit, the register value with the least number of acquisitions by the priority release reservation station through step 105, etc. are preferably released.

In the related art, when a vector instruction is split into a plurality of vector elements for execution, each vector element needs to read a register file in the execution process, in order to implement high concurrent execution of each vector element, a large number of register file read ports are generally required, and the large number of register file read ports can cause an increase in the area and power consumption of the register file, and meanwhile, the time sequence path of the processor is increased, so that the processing frequency of the processor is seriously reduced. The embodiment of the invention provides a memory access method, which comprises the steps of adding a buffer unit in a reservation station, returning a target index value corresponding to a first vector element to the reservation station by using the buffer unit according to a source register number carried by a first read request, merging requests carrying the same source register number in the received first read request, accessing a register file based on the merged second read request to acquire a first register value corresponding to the second read request, avoiding repeated access to the register file, reducing the times of accessing the register file, and thus reducing the read port pressure of the register file.

It may be understood that, in the embodiment of the present invention, only a small number of register values in the register file need to be stored in the local cache unit, so that a second register value can be provided for a large number of first vector elements, and when the execution condition of the first vector elements is satisfied, the second register value corresponding to the first vector elements can be directly obtained from the cache unit according to the target index value corresponding to the first vector elements, without accessing the register file again, and a larger emission window is provided for the target vector instruction with less register file read port overhead, so that the efficiency of reading the register values from the register file in the parallel execution process of the first vector elements is improved, and further the processing frequency of the processor is improved.

In an alternative embodiment of the present invention, the step 102 of the buffer unit receiving the first read request and returning, to the reservation station, a target index value corresponding to the first vector element according to the source register number carried by the first read request includes:

in step 1021, the buffer unit receives the first read request, and determines whether a first index value corresponding to the source register number exists locally according to the source register number carried by the first read request.

Step 1022, the caching unit determines the first index value as the target index value corresponding to the first vector element when the first index value exists locally, and returns the target index value to the reservation station.

Step 1023, the caching unit creates a target index value corresponding to the first vector element and returns the target index value to the reservation station when the first index value does not exist locally.

In the embodiment of the present invention, when the buffer unit receives the first read request sent by the reservation station, the process of returning the target index value corresponding to the first vector element to the reservation station according to the source register number carried by the first read request may be implemented through operations corresponding to steps 1021 to 1023.

Specifically, when receiving the first read request sent by the reservation station, the buffer unit may first obtain the source register number carried by the first read request from the first read request; then, matching the source register number with the register number corresponding to each locally stored index value, and indicating that the cache unit locally has the first index value corresponding to the source register number carried by the first reading request under the condition that the register number corresponding to the locally stored index value has the same register number as the source register number carried by the first reading request; and under the condition that the register number which is the same as the source register number carried by the first reading request does not exist in the register number corresponding to the locally stored index value, the cache unit is indicated that the first index value corresponding to the source register number carried by the first reading request does not exist locally.

In the case that the first index value exists locally in the cache unit, the cache unit may directly determine the first index value as a target index value corresponding to the first vector element, and return the target index value to the reservation station. In the case that the first index value does not exist locally in the cache unit, the cache unit may create a target index value corresponding to the first vector element locally, and return the target index value to the reservation station.

It will be appreciated that in the case where the source register number of the first vector element is the same, the target index value returned by the cache unit to the reservation station is the same.

According to the access method provided by the embodiment of the invention, when the cache unit receives the first reading request, firstly, according to the source register number carried by the first reading request, whether a first index value corresponding to the source register number exists locally is determined, and when the first index value exists locally, the first index value is determined to be a target index value corresponding to the first vector element, and when the first index value does not exist locally, the target index value corresponding to the first vector element is created, so that the cache unit can return the target index value corresponding to each first vector element to the reservation station, and the reservation station can accurately acquire a second register value corresponding to the first vector element according to the target index value in step 105.

In an alternative embodiment of the present invention, the step 104 of the buffer unit obtaining, from the register file, a first register value corresponding to the second read request includes:

in step 1041, the buffer unit obtains the first register value from the source register corresponding to the source register number carried by the second read request in the register file when the first register value corresponding to the source register number carried by the second read request does not exist locally.

In step 1042, the buffer unit stores the first register value locally according to the corresponding relationship between the source register number and the target index value.

In the embodiment of the present invention, in the process of acquiring the first register value corresponding to the second read request from the register file, the cache unit may first confirm whether the first register value corresponding to the second read request exists locally; under the condition that a first register value corresponding to a second read request exists locally, the cache unit does not need to access the register file; in the case where there is no first register value corresponding to the second read request locally, the buffer unit may acquire the first register value corresponding to the second read request from the register file through steps 1041 to 1042.

It will be appreciated that the first register value corresponding to the second read request is all data stored in the register corresponding to the source register number carried by the second read request.

The method comprises the steps of storing a first register value locally according to a corresponding relation between a source register number and a target index value, wherein the specific steps are as follows: according to the corresponding relation between the source register number carried by the second read request and the target index value returned to the reservation station in step 102, the first register value is stored in a local area corresponding to the target index value corresponding to the source register number carried by the second read request, so that the reservation station can locally determine, in step 105, the first register value corresponding to the target index value from the cache unit according to the target index value, and acquire, according to the first register value corresponding to the target index value, the second register value corresponding to the first vector element.

In the access method provided by the embodiment of the invention, the buffer unit only needs to acquire the first register value from the register file and store the first memory value when the buffer unit does not locally have the first register value corresponding to the source register number carried by the second read request, and the buffer unit does not need to access the register file when the buffer unit locally has the first register value corresponding to the source register number carried by the second read request, so that the number of times of accessing the register file by the buffer unit is further reduced and the read port pressure of the register file is reduced.

In an alternative embodiment of the present invention, the splitting of the target vector instruction by the reservation station in step 101 results in at least two first vector elements, including:

In step 1011, the reservation station determines a vector source register corresponding to the target vector instruction when receiving the target vector instruction.

Step 1012, the reservation station splits the vector instruction according to the vector source register to obtain a first vector element; the number of first vector elements is the same as the number of vector source registers.

The method further comprises the steps of:

and step A11, under the condition that the reservation station receives the target index value corresponding to the first vector element, carrying out secondary splitting on the first vector element according to the length of the vector source register and the nominal width of the vector element in the vector source register to obtain at least two second vector elements.

The source registers corresponding to the target vector instruction comprise vector source registers, and the number of the vector source registers is at least 2.

In the embodiment of the invention, the splitting process of the reservation station on the target vector instruction comprises primary splitting and secondary splitting. The reservation station may obtain a first vector element by implementing a first split of the target vector instruction through steps 1011 to 1022, and obtain a second vector element by implementing a second split of the first vector element through step a 11.

The primary split basis is a vector source register corresponding to the target vector instruction, and the secondary split basis is the length of the vector source register corresponding to the target vector instruction and the nominal width of vector elements in the vector source register.

Specifically, in the one-time splitting process corresponding to steps 1011 to 1022, the reservation station may first determine a source register corresponding to the target vector instruction according to the configuration parameter corresponding to the target vector instruction, and determine a vector source register corresponding to the target vector instruction from the source registers corresponding to the target vector instruction.

Under the condition that a vector source register corresponding to a target vector instruction is determined, the reservation station can split the vector instruction according to the vector source register to obtain a first vector element corresponding to the vector source register; the number of the first vector elements is the same as the number of vector source registers corresponding to the target vector instruction, and each first vector element corresponds to one vector source register, and the second register value corresponding to the first vector element is stored in the vector source register corresponding to the first vector element.

In the secondary splitting process of step a11, the reservation station may perform secondary splitting on the first vector element according to the length of the vector source register and the nominal width of the vector element in the vector source register under the condition that the target index value corresponding to the first vector element returned by the buffer unit through step 102 is received, so as to obtain at least two second vector elements corresponding to the first vector element. It can be understood that the first vector element is obtained after the second vector element corresponding to the first vector element is spliced.

As an example, the source registers corresponding to the target vector instruction include 1 integer source register, 8 vector source registers (s 0, s1, s2, s3, s4, s5, s6, s 7); the length of a single vector source register corresponding to the target vector instruction is 128 bits, and the nominal width of vector elements in the vector source register is 8 bits.

In a one-time splitting process, the target vector instruction can be split into 8 first vector elements corresponding to vector source registers (s 0, s1, s2, s3, s4, s5, s6 and s 7) according to a vector source register corresponding to the target vector instruction.

Reservation station in the secondary splitting process, each first vector element may be split into 16 second vector elements by step a11, where the total number of second vector elements corresponding to the target vector instruction is 8×16=128.

It should be noted that, while the reservation station executes step a11, the cache unit may execute the operations of step 103 and step 104 simultaneously, so as to implement synchronous execution of the second splitting process of the target vector instruction and the acquiring process of the first register value, and the cache unit accesses the register file to read the first register value by using the splitting time of the target vector instruction, thereby hiding the read delay of the cache unit in reading the first register value, and further improving the efficiency of reading the register value from the register file and the processing frequency of the processor in the parallel execution process of the first vector element.

In an optional embodiment of the present invention, in case that the reservation station satisfies the execution condition of the first vector element in step 105, obtaining, according to the target index value, a second register value corresponding to the first vector element from the cache unit includes:

step 1051, the reservation station obtains address offset information corresponding to any second vector element when the execution condition of the second vector element is satisfied.

Step 1052, the reservation station obtains the third register value corresponding to the second vector element from the cache unit according to the target index value and the address offset information.

The address offset information includes an address offset of a third register value corresponding to the second vector element in the second register value. The reservation station may obtain address offset information corresponding to the second vector element from a configuration register corresponding to the target vector instruction.

In the embodiment of the invention, when the process of splitting the target vector instruction by the reservation station comprises primary splitting and secondary splitting, the second register value comprises a third register value corresponding to the second vector element; it is understood that the third register value corresponding to the second vector element is a data segment in the second register value.

Specifically, the reservation station is used for acquiring a second register value corresponding to the first vector element from the cache unit according to the target index value under the condition that the execution condition of the first vector element is met:

First, the reservation station acquires address offset information corresponding to the second vector element from a configuration register corresponding to the target vector instruction.

Then, the reservation station sends a data acquisition request to the cache unit, wherein the data acquisition request carries a target index value corresponding to the first vector element and address offset information corresponding to the second vector element.

Then, the buffer unit searches the first register value corresponding to the target index value locally according to the target index value in the data acquisition request, determines the first register value corresponding to the target index value as the second register value corresponding to the first vector element, determines the third register value corresponding to the second vector element from the second register value according to the address offset information corresponding to the second vector element, and sends the third register value to the reservation station.

Finally, the reservation station receives the third register value sent by the caching unit and sends the second vector element to the functional unit for executing the second vector element for the functional unit to execute the first vector element based on the third register value.

In an alternative embodiment of the invention, the processor further comprises a renaming (rename) module for allocating physical registers for vector instructions in the reservation station; the method further comprises the steps of:

step B11, the renaming module sends a synchronizing signal to the cache unit under the condition that a physical register corresponding to the target vector instruction is changed; the synchronization signal is used to indicate the register number of the physical register in which the change occurred.

And step B12, the buffer unit releases a register value corresponding to the register number in the local area according to the register number indicated by the synchronous signal.

Among other things, the case where the physical register is changed may include, but is not limited to: data update in physical registers, data deletion in physical registers, physical register changes, etc.

In the embodiment of the invention, the cache unit can be in communication connection with a renaming module in the processor, and the renaming module sends a synchronous signal to the cache unit in real time under the condition that a physical register corresponding to a target vector instruction is changed; under the condition that the buffer unit receives the synchronizing signal, according to the register number indicated by the synchronizing signal, releasing a register value corresponding to the register number in the local area; in the case that the source register number carried by the second read request obtained in the subsequent step 103 is the register number indicated by the synchronization signal, the buffer unit may acquire the first register value from the changed register again through step 104, so as to improve the accuracy of the register value locally stored in the buffer unit.

The embodiment of the invention provides another implementation way for releasing the register value locally stored by the cache unit; specifically, the buffer unit receives the synchronization signal sent by the renaming module in real time, and releases the register corresponding to the locally stored register number based on the register number of the changed physical register indicated by the synchronization signal under the condition that the synchronization signal sent by the renaming module is received, so that the flexibility of the storage mode of the register value in the buffer unit in the embodiment of the invention is improved.

In summary, the embodiment of the invention provides a memory access method, which comprises the steps of adding a buffer unit in a reservation station, returning a target index value corresponding to a first vector element to the reservation station by using the buffer unit according to a source register number carried by a first read request, merging requests carrying the same source register number in the received first read request, accessing a register file based on the merged second read request to acquire a first register value corresponding to the second read request, avoiding repeated access to the register file, reducing the times of accessing the register file, and thus reducing the read port pressure of the register file; under the condition that the reservation station meets the execution condition of the first vector element, the second register value corresponding to the first vector element can be directly obtained from the cache unit according to the target index value corresponding to the first vector element, the register file is not required to be accessed again, the efficiency of reading the register value from the register file in the parallel execution process of the first vector element is improved, and the processing frequency of the processor is further improved.

It should be noted that, for simplicity of description, the method embodiments are shown as a series of acts, but it should be understood by those skilled in the art that the embodiments are not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred embodiments, and that the acts are not necessarily required by the embodiments of the invention.

Device embodiment

Referring to fig. 3, there is shown a block diagram of another processor of the present invention, which includes a register file 20, a reservation station 10, and a cache unit 11 disposed in the reservation station 10;

The reservation station 10 is configured to split a target vector instruction to obtain at least two first vector elements, and send a first read request corresponding to the first vector elements to the cache unit 11; the first read request carries a source register number of the first vector element;

The buffer unit 11 is configured to receive the first read request, and return, to the reservation station 10, a target index value corresponding to the first vector element according to a source register number carried by the first read request; combining the received requests carrying the same source register number in the first read request to obtain at least one second read request; and obtaining a first register value corresponding to the second read request from the register file 20;

The reservation station 10 is further configured to, when the execution condition of the first vector element is satisfied, obtain, from the cache unit 11, a second register value corresponding to the first vector element according to the target index value.

Optionally, the buffer unit is specifically configured to:

Receiving the first reading request, and determining whether a first index value corresponding to a source register number exists locally according to the source register number carried by the first reading request;

Determining the first index value as a target index value corresponding to the first vector element under the condition that the first index value exists locally, and returning the target index value to the reservation station;

and under the condition that the first index value does not exist locally, creating a target index value corresponding to the first vector element, and returning the target index value to the reservation station.

Optionally, the buffer unit is specifically configured to:

Acquiring a first register value from a source register corresponding to a source register number carried by the second read request in the register file under the condition that the first register value corresponding to the source register number carried by the second read request does not exist locally;

And storing the first register value locally according to the corresponding relation between the source register number and the target index value.

Optionally, the source register comprises a vector source register; the reservation station is specifically configured to:

under the condition that a target vector instruction is received, determining a vector source register corresponding to the target vector instruction;

Splitting the vector instruction according to the vector source register to obtain a first vector element; the number of first vector elements is the same as the number of vector source registers.

Optionally, the reservation station is further configured to:

and under the condition that a target index value corresponding to the first vector element is received, carrying out secondary splitting on the first vector element according to the length of the vector source register and the nominal width of the vector element in the vector source register to obtain at least two second vector elements.

Optionally, the second register value includes a third register value corresponding to the second vector element; the reservation station is specifically configured to:

acquiring address offset information corresponding to any one of the first vector elements when the execution condition of the second vector element is met;

And acquiring a third register value corresponding to the second vector element from the cache unit according to the target index value and the address offset information.

Optionally, the processor further includes a renaming module for allocating physical registers for vector instructions in the reservation station;

The renaming module is used for sending a synchronizing signal to the cache unit under the condition that a physical register corresponding to the target vector instruction is changed; the synchronous signal is used for indicating the register number of the changed physical register;

the buffer unit is further configured to release a register value corresponding to the register number in the local area according to the register number indicated by the synchronization signal.

For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.

The specific manner in which the various modules perform the operations in relation to the processor of the above-described embodiments have been described in detail in relation to the embodiments of the method and will not be described in detail herein.

Referring to fig. 4, a block diagram of an electronic device for access according to an embodiment of the present invention is provided. As shown in fig. 4, the electronic device includes: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete communication with each other through the communication bus; the memory is configured to store executable instructions that cause the processor to perform the memory access method of the foregoing embodiment.

The Processor may be a CPU (Central Processing Unit ), general purpose Processor, DSP (DIGITAL SIGNAL Processor ), ASIC (Application SPECIFIC INTEGRATED Circuit), FPGA (Field Programmble GATE ARRAY, field programmable gate array) or other editable device, transistor logic device, hardware component, or any combination thereof. The processor may also be a combination that performs the function of a computation, e.g., a combination comprising one or more microprocessors, a combination of a DSP and a microprocessor, etc.

The communication bus may include a path to transfer information between the memory and the communication interface. The communication bus may be a PCI (PERIPHERAL COMPONENT INTERCONNECT, peripheral component interconnect standard) bus or an EISA (Extended Industry Standard Architecture ) bus, or the like. The communication bus may be classified into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one line is shown in fig. 4, but not only one bus or one type of bus.

The memory may be a ROM (Read Only memory) or other type of static storage device that can store static information and instructions, a RAM (Random Access memory) or other type of dynamic storage device that can store information and instructions, an EEPROM (ELECTRICALLY ERASABLE PROGRAMMABLE READ ONLY, electrically erasable programmable Read Only memory), a CD-ROM (Compact Disa Read Only, compact disc Read Only), a magnetic tape, a floppy disk, an optical data storage device, and the like.

Embodiments of the present invention also provide a non-transitory computer-readable storage medium, which when executed by a processor of an electronic device (server or terminal), enables the processor to perform the memory access method shown in fig. 1.

It will be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the invention may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the embodiments of the invention.

Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or terminal device that comprises the element.

The foregoing has described in detail the memory access method, the processor, the electronic device and the readable storage medium provided by the present invention, and specific examples have been applied to illustrate the principles and embodiments of the present invention, and the above description of the embodiments is only used to help understand the method and the core idea of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims

1. The access method is characterized by being applied to a processor, wherein the processor comprises a register file, a reservation station and a cache unit arranged in the reservation station; the method comprises the following steps:

2. The method of claim 1, wherein the caching unit receiving the first read request and returning, to the reservation station, a target index value corresponding to the first vector element according to a source register number carried by the first read request, comprises:

the caching unit receives the first reading request and determines whether a first index value corresponding to a source register number exists locally according to the source register number carried by the first reading request;

The caching unit determines the first index value as a target index value corresponding to the first vector element under the condition that the first index value exists locally, and returns the target index value to the reservation station;

And the caching unit creates a target index value corresponding to the first vector element under the condition that the first index value does not exist locally, and returns the target index value to the reservation station.

3. The method of claim 1, wherein the cache unit obtaining a first register value corresponding to the second read request from the register file, comprises:

The cache unit obtains a first register value from a source register corresponding to a source register number carried by the second read request in the register file under the condition that the first register value corresponding to the source register number carried by the second read request does not exist locally;

And the caching unit stores the first register value locally according to the corresponding relation between the source register number and the target index value.

4. The method of claim 1, wherein the source registers comprise vector source registers; the reservation station splits the target vector instruction to obtain at least two first vector elements, including:

the reservation station determines a vector source register corresponding to a target vector instruction under the condition that the target vector instruction is received;

The reservation station splits the vector instruction according to the vector source register to obtain a first vector element; the number of first vector elements is the same as the number of vector source registers.

5. The method according to claim 4, wherein the method further comprises:

And under the condition that the reservation station receives the target index value corresponding to the first vector element, carrying out secondary splitting on the first vector element according to the length of the vector source register and the nominal width of the vector element in the vector source register to obtain at least two second vector elements.

6. The method of claim 5, wherein the second register value comprises a third register value corresponding to the second vector element; the reservation station obtaining, according to the target index value, a second register value corresponding to the first vector element from the cache unit when the execution condition of the first vector element is satisfied, including:

The reservation station acquires address offset information corresponding to any one of the first vector elements under the condition that the execution condition of the second vector element is met;

and the reservation station acquires a third register value corresponding to the second vector element from the cache unit according to the target index value and the address offset information.

7. The method of claim 1, wherein the processor further comprises a renaming module to allocate physical registers for vector instructions in the reservation station; the method further comprises the steps of:

The renaming module sends a synchronizing signal to the cache unit under the condition that a physical register corresponding to the target vector instruction is changed; the synchronous signal is used for indicating the register number of the changed physical register;

and the caching unit releases a register value corresponding to the register number in the local area according to the register number indicated by the synchronous signal.

8. A processor, comprising a register file, a reservation station, and a cache unit disposed in the reservation station;

9. The processor according to claim 8, wherein the cache unit is specifically configured to:

10. The processor according to claim 8, wherein the cache unit is specifically configured to:

11. The processor of claim 8, wherein the source register comprises a vector source register; the reservation station is specifically configured to:

12. The processor of claim 11, wherein the reservation station is further configured to:

13. The processor of claim 12, wherein the second register value comprises a third register value corresponding to the second vector element; the reservation station is specifically configured to:

14. An electronic device, comprising a processor, a memory, a communication interface, and a communication bus, wherein the processor, the memory, and the communication interface communicate with each other via the communication bus; the memory is configured to store executable instructions that cause the processor to perform the memory access method of any one of claims 1 to 7.

15. A readable storage medium, characterized in that instructions in the readable storage medium, when executed by a processor of an electronic device, enable the processor to perform the memory access method of any one of claims 1 to 7.