CN117170747A - Program and instruction processing, training and predicting method and device and processor - Google Patents
Program and instruction processing, training and predicting method and device and processor Download PDFInfo
- Publication number
- CN117170747A CN117170747A CN202311095296.5A CN202311095296A CN117170747A CN 117170747 A CN117170747 A CN 117170747A CN 202311095296 A CN202311095296 A CN 202311095296A CN 117170747 A CN117170747 A CN 117170747A
- Authority
- CN
- China
- Prior art keywords
- instruction
- branch
- cache
- micro
- uild
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 64
- 238000012549 training Methods 0.000 title claims abstract description 54
- 238000012545 processing Methods 0.000 title description 17
- 238000003672 processing method Methods 0.000 claims abstract description 20
- 230000004044 response Effects 0.000 claims description 46
- 230000000977 initiatory effect Effects 0.000 claims description 7
- 230000009191 jumping Effects 0.000 claims description 6
- 230000008569 process Effects 0.000 description 14
- 238000004891 communication Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 8
- 238000013461 design Methods 0.000 description 5
- 230000006399 behavior Effects 0.000 description 3
- 238000004590 computer program Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 241000761456 Nops Species 0.000 description 1
- 210000005100 blood-tumour barrier Anatomy 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 108010020615 nociceptin receptor Proteins 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
Landscapes
- Advance Control (AREA)
Abstract
The embodiment of the disclosure provides a program processing method and device, an instruction processing method, a finger taking mode training method, a finger taking mode prediction method and a processor. The program processing method comprises the following steps: creating an op_b uiild_start instruction at a first location in the target program; creating an OP_BUILD_END instruction corresponding to the OP_BUILD_START instruction at a second position after the first position in the target program; the op_b uild_start instruction is used to instruct the processor to train instructions following the op_b uild_start instruction into the micro instruction cache, and the op_b uild_end instruction is used to inform the processor that instructions following the op_b uild_end instruction are stopped from being trained into the micro instruction cache. The program processing method improves the utilization rate of micro instruction cache, improves the switching efficiency and flexibility of the instruction fetching mode, reduces the complexity and power consumption of processor hardware, and further improves the performance of the system.
Description
Technical Field
One or more embodiments of the present disclosure relate to a program processing method and apparatus, an instruction processing method, a training method of an instruction fetch mode, a prediction method of an instruction fetch mode, and a processor.
Background
Modern processors typically employ Pipeline (Pipeline) techniques to process instructions in parallel to speed instruction processing efficiency. The processor core includes a plurality of pipeline stages, for example, after the pipeline feeds into program counters of various sources, a next Program Counter (PC) is selected by a multiplexer (Mux), and an instruction corresponding to the program counter is subjected to branch prediction (Branch prediction), instruction fetch (Instruction fetch), instruction Decode (Decode), instruction dispatch and rename (Dispatch and Rename), instruction execution (Execute), instruction end (Retire), and the like.
In processing branch instructions, branch prediction (Branch Prediction)) techniques may be employed to avoid waiting for the branch instruction execution results to determine the direction of the branch. The branch instruction can be predicted by the branch prediction technology, and the branch prediction result of the branch direction and the like is included, so that the processor is pushed to perform the next instruction fetching operation, and the pipeline delay caused by waiting for the execution result of the branch instruction is avoided. Modern processors typically employ branch prediction techniques having multiple levels of branch prediction logic, the higher the number of levels of branch prediction logic accessed during branch prediction, the higher the prediction accuracy of the branch prediction.
The processor core translates each architecture instruction (instruction) into one or more micro-instructions (uOp) in the micro-architecture, each micro-instruction only performs limited operations, so that different micro-instructions can be guaranteed to be sent to corresponding execution units, meanwhile, the execution pipeline level of each micro-instruction is relatively short, and parallel and out-of-order operations can be performed among the plurality of micro-instructions, so that the performance of the processor core is improved.
Disclosure of Invention
At least one embodiment of the present disclosure provides a program processing method, including: creating an op_b uiild_start instruction at a first location in the target program; creating an OP_BUILD_END instruction corresponding to the OP_BUILD_START instruction at a second position after the first position in the target program, wherein the OP_BUILD_START instruction is used for instructing a processor to train an instruction after the OP_BUILD_START instruction into a micro-instruction cache, and the OP_BUILD_END instruction is used for informing the processor that the instruction after the OP_BUILD_END instruction stops being trained into the micro-instruction cache.
For example, the program processing method provided in at least one embodiment of the present disclosure further includes: and determining an object loop body in the target program, wherein an inlet position of the object loop body is taken as the first position, and an outlet position of the object loop body is taken as the second position.
For example, in a program processing method provided in at least one embodiment of the present disclosure, the target program includes a first loop body and a second loop body; determining an object loop body in the target program, comprising: and determining the first loop body as the object loop body in response to the number of times or the number of instructions executed in the first loop body being greater than the number of times or the number of instructions executed in the second loop body.
For example, the program processing method provided in at least one embodiment of the present disclosure further includes: determining an object loop body in the target program; and determining the second position in the object loop body by taking the entry position of the object loop body as the first position in response to the program instruction number of the object loop body being larger than the storage capacity of the micro instruction cache and in response to a difference value of the program instruction number of the object loop body minus the program instruction number stored in the micro instruction cache reaching a threshold value.
For example, in a program processing method provided in at least one embodiment of the present disclosure, the op_b uild_start instruction and the op_b uild_end instruction are both null instructions.
For example, in a program processing method provided by at least one embodiment of the present disclosure, a compiler is used to create the op_b uiild_start instruction at the first location and the op_b uiild_end instruction at the second location.
At least one embodiment of the present disclosure provides an instruction processing method, including: training instructions subsequent to the op_b uiild_start instruction into a microinstruction cache in response to identifying the op_b uiild_start instruction during execution of the target program; in response to identifying an op_b uild END instruction corresponding to the op_b ild_start instruction during execution of the target program, instructions subsequent to the op_b ild END instruction are stopped from being trained into the microinstruction cache, wherein the op_b ild_start instruction is at a first location in the target program and the op_b ild END instruction is at a second location in the target program subsequent to the first location.
For example, at least one embodiment of the present disclosure provides an instruction processing method, where the object program includes an object loop body, an entrance position of the object loop body is the first position, and an exit position of the object loop body is the second position.
For example, the instruction processing method provided in at least one embodiment of the present disclosure further includes: entering an instruction cache fetch mode in response to a microinstruction cache fetch miss for a first object instruction following the op_b uild_start instruction, determining whether the first object instruction jumps out of the object loop body; and in response to the first object instruction not jumping out of the object loop body, continuing training the first object instruction and instructions after the first object instruction into the micro instruction cache.
At least one embodiment of the present disclosure provides a training method for a fetch mode of a processor, the training method including, in executing a target program including an op_b uiild_start instruction and an op_b uiild_end instruction: acquiring branch instruction information of an object branch instruction immediately preceding the op_b uid_end instruction; determining whether the target branch instruction jumps according to the branch instruction information to determine whether to update the confidence value of a starting micro instruction cache fetching mode corresponding to the target branch instruction; and reading a branch prediction buffer table entry corresponding to the object branch instruction and writing the updated confidence value of the START micro instruction cache instruction fetching mode, wherein the OP_BUILD_START instruction is used for instructing a processor to train an instruction after the OP_BUILD_START instruction into a micro instruction cache, the OP_BUILD_END instruction is used for informing the processor that the instruction after the OP_BUILD_END instruction stops being trained into the micro instruction cache, the OP_BUILD_START instruction is at a first position of the target program, and the OP_BUILD_END instruction is at a second position after the first position in the target program.
For example, in a training method provided in at least one embodiment of the present disclosure, acquiring branch instruction information of a target branch instruction immediately preceding the op_b uild instruction includes: when the op_b uiild_end instruction reaches a release stage, branch instruction information of the object branch instruction is determined.
For example, in the training method provided in at least one embodiment of the present disclosure, the target program includes an object loop body, an inlet position of the object loop body is the first position, and an outlet position of the object loop body is the second position.
For example, the training method provided in at least one embodiment of the present disclosure further includes: in response to the instruction length of the loop body being less than or equal to the capacity of the micro instruction cache, taking a branch instruction immediately preceding the op_b uild instruction in the micro instruction cache as the subject branch instruction; or, in response to the instruction length of the loop body being greater than the capacity of the micro instruction cache, taking a branch instruction with a branch prediction target address being an instruction address of the op_b uild_start instruction as the target branch instruction when all instructions of the loop body are released.
For example, in a training method provided in at least one embodiment of the present disclosure, determining, according to the branch instruction information, whether the target branch instruction jumps to determine whether to update a confidence value of an initiation micro instruction cache fetch mode corresponding to the target branch instruction includes: when the execution direction of the target branch instruction is determined to be non-jump according to the branch instruction information, determining to reduce the confidence value of the starting micro instruction cache fetching mode; and when the execution direction of the target branch instruction is determined to be jump according to the branch instruction information, determining to increase the confidence value of the starting micro instruction cache fetching mode.
For example, in the training method provided in at least one embodiment of the present disclosure, writing the updated confidence value of the start micro instruction cache fetch mode includes: and when the OP_BUILD_END instruction is released, writing the updated confidence value of the starting micro instruction cache instruction fetching mode in a branch prediction buffer table entry corresponding to the object branch instruction.
At least one embodiment of the present disclosure provides a method for predicting a finger fetch mode, the method comprising: acquiring an object branch instruction in an object program comprising an OP_BUILD_START instruction and an OP_BUILD_END instruction; querying the object branch instruction in a branch prediction buffer to determine whether to branch predict the object branch instruction and whether to enter a micro instruction cache fetching mode for the object instruction, wherein an entry of the branch prediction buffer includes a confidence value for initiating a micro instruction cache fetching path for predicting whether to initiate a micro instruction cache fetching mode, the op_b uild_start instruction is used for instructing a processor to train an instruction following the op_b ild_start instruction into a micro instruction cache, the op_b ild_end instruction is used for informing the processor that an instruction following the op_b ild_end instruction is stopped from being trained into the micro instruction cache, the op_b ild_start instruction is at a first location of the target program, and the op_b ild_end instruction is at a second location following the first location in the target program.
For example, in the prediction method provided in at least one embodiment of the present disclosure, in response to the branch prediction direction of the target branch instruction being a jump and a confidence value > set value of a micro instruction cache fetching mode being started, after waiting for the arrival of the branch target address of the target branch instruction, the method enters the micro instruction cache fetching mode; and responding to the situation that the branch prediction direction of the object branch instruction is not jump or the confidence value < = the set value of the starting micro instruction cache fetching mode, and keeping the current fetching mode after the branch target address of the object branch instruction arrives.
At least one embodiment of the present disclosure provides a program processing apparatus including: a first creation module configured to create an op_b uiild_start instruction at a first location in the target program; and a second creating module configured to create an op_b uild_end instruction corresponding to the op_b uild_start instruction at a second position located after the first position in the target program, wherein the op_b uild_start instruction is used for instructing a processor to train an instruction after the op_b uild_start instruction into a micro instruction cache, and the op_b uild_end instruction is used for notifying the processor that the instruction after the op_b uild_end instruction stops being trained into the micro instruction cache.
At least one embodiment of the present disclosure provides a processor including a decode unit and a microinstruction cache, wherein the decode unit is configured to train instructions subsequent to an op_b uild_start instruction into the microinstruction cache in response to identifying the op_b ild_start instruction during execution of a target program, and to stop the instructions subsequent to the op_b ild_end instruction from being trained into the microinstruction cache in response to identifying the op_b ild_end instruction during execution of the target program, wherein the op_b ild_start instruction is at a first location in the target program and the op_b ild_end instruction is at a second location in the target program that is subsequent to the first location.
For example, at least one embodiment of the present disclosure provides a processor further comprising: a branch prediction unit and a branch prediction buffer, wherein the branch prediction unit is configured to: and acquiring the branch instruction information of an object branch instruction immediately before the OP_BUILD_END instruction, determining whether the object branch instruction jumps according to the branch instruction information to determine whether to update the confidence value of an initiation micro instruction cache instruction fetching mode corresponding to the object branch instruction, reading a branch prediction buffer table entry corresponding to the object branch instruction, and writing the updated confidence value of the initiation micro instruction cache instruction fetching mode.
For example, in a processor provided by at least one embodiment of the present disclosure, the branch prediction unit is further configured to: the method comprises the steps of acquiring an object branch instruction in a target program comprising an OP_BUILD_START instruction and an OP_BUILD_END instruction, and querying the object branch instruction in a branch prediction buffer to determine whether to carry out branch prediction on the object branch instruction and whether to judge whether the object instruction enters a micro instruction cache instruction fetching mode.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings of the embodiments will be briefly described below, and it is apparent that the drawings in the following description relate only to some embodiments of the present disclosure, not to limit the present disclosure.
FIG. 1A shows a schematic diagram of a pipeline of a processor core;
FIG. 1B shows a schematic diagram of a front-end architecture of a processor;
FIG. 2 illustrates an example of some loop bodies in a target program;
FIG. 3 shows an example of a case where two loop volumes are nested in a target program;
FIG. 4 illustrates a flow chart of an example instruction processing method in accordance with at least one embodiment of the present disclosure;
FIG. 5 illustrates a flow chart of an example finger-taking mode training method in accordance with at least one embodiment of the present disclosure;
FIG. 6 illustrates a flow diagram of an example method of predicting a fetch pattern in accordance with at least one embodiment of the present disclosure;
fig. 7 is a schematic block diagram of an electronic device provided in at least one embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present disclosure. It will be apparent that the described embodiments are some, but not all, of the embodiments of the present disclosure. All other embodiments, which can be made by one of ordinary skill in the art without the need for inventive faculty, are within the scope of the present disclosure, based on the described embodiments of the present disclosure.
Unless defined otherwise, technical or scientific terms used in this disclosure should be given the ordinary meaning as understood by one of ordinary skill in the art to which this disclosure belongs. The terms "first," "second," and the like, as used in this disclosure, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", etc. are used merely to indicate relative positional relationships, which may also be changed when the absolute position of the object to be described is changed.
The present disclosure is illustrated by the following several specific examples. Detailed descriptions of known functions and known parts (elements) may be omitted for the sake of clarity and conciseness in the following description of the embodiments of the present disclosure. When any part (element) of an embodiment of the present disclosure appears in more than one drawing, the part (element) is denoted by the same or similar reference numeral in each drawing.
In this disclosure, the description is generally directed to "processor cores", but for convenience "processor cores" are also referred to simply as "processors", i.e., both "processor" and "processor cores" are equivalent in this disclosure at least in terms of instruction processing pipelines.
FIG. 1A shows a schematic diagram of a pipeline of a processor core, with the dashed lines with arrows representing redirected instruction flow. As shown in fig. 1A, a processor core (e.g., CPU core) of a single-core processor or a multi-core processor improves inter-instruction parallelism by pipelining (Instruction Level Parallelism). The processor core includes a plurality of pipeline stages, for example, after the pipeline feeds into program counters of various sources, a next Program Counter (PC) is selected by a multiplexer (Mux), and an instruction corresponding to the program counter is subjected to branch prediction (Branch prediction), instruction fetch (Instruction fetch), instruction Decode (Decode), instruction dispatch and rename (Dispatch and Rename), instruction execution (Execute), instruction end (Retire), and the like. Wait queues, typically first-in-first-out (FIFO) queues, are provided as needed between the various pipeline stages. For example, after the branch prediction unit, a Branch Prediction (BP) FIFO queue is provided to store branch prediction results; after the instruction fetch unit, an instruction cache (Instruction Cache, IC) FIFO is provided to cache fetched instructions; after the instruction decode unit, a Decode (DE) FIFO is provided to buffer decoded instructions; after the instruction dispatch and rename unit, an end (RT) FIFO is provided to buffer instructions waiting for confirmation of end after execution. While the pipeline of the processor core also includes an instruction queue to cache waiting instruction execution units to execute instructions after instruction dispatch and renaming.
Scalar (scaler) processor (CPU) instruction pipelines include a five-stage pipeline in which each instruction may issue and execute within a fixed period of time (e.g., 5 clock cycles) per clock cycle. Execution of each instruction is divided into 5 steps: a fetch (IF) stage, a Decode (DE) stage, an Execute (EX) stage, a memory access (MEM) stage, and a Write Back (WB) stage. For example, superscalar processors may further support out-of-order execution. Out-of-order execution refers to a technique that the CPU employs to allow multiple instructions to be split out of program-specified order for processing by the corresponding circuit units.
The pipeline of a processor may be generally divided into a front-end and a back-end. FIG. 1B shows a schematic diagram of a front-end architecture of a processor. As shown in fig. 1B, the front-end architecture 10 of the processor includes a branch predictor 101, a branch instruction prediction information queue 102, an instruction buffer (IC) 103, a decoder 104, a microinstruction processing module 105, a microinstruction buffer (OC) 106, a microinstruction queue 107, and a dispatcher (instruction transmitter) 108.
The process of fetching, decoding, and transmitting in the front-end architecture of the processor is described below in connection with fig. 1B. In this process, the branch predictor 101 first sends fetch prediction information to the branch instruction prediction information queue 102 for caching, awaiting processing of the prediction information.
The instruction address fetched by the instruction fetch unit is predicted by branch prediction to obtain the instruction address for the next execution. The instruction address determines whether instruction decoding is required for the instruction corresponding to the instruction address through instruction fetching mode selection logic. If yes, the left path in FIG. 1B is taken to decode the instruction; if not, the right path in FIG. 1B is taken, the instruction is not decoded, and the micro instruction cache is accessed to directly obtain the corresponding micro instruction set data.
For this fetch mode selection logic, for example, the processor initially enables an instruction cache fetch mode (IC fetch mode) to process the prediction information. For example, first, based on prediction address information in prediction information from the branch instruction prediction information queue (or branch target buffer) 102, instruction data requested by the prediction information is attempted to be extracted from the instruction buffer 103, and the instruction data is sent to the decoder 104 for decoding. The instruction data may here be continuous binary data. Decoder 104 may decode the fetched instruction data into corresponding micro instruction groups (e.g., each micro instruction group includes one or more micro instructions) and send the micro instruction groups to a micro instruction queue 107 cache to await dispatch (not shown in FIG. 1B).
Decoder 104 also sends the decoded set of micro instructions to micro instruction register 106 for caching, i.e., to "train" the decoded micro instructions into the micro instruction register. At this point, a micro instruction register entry (entry) may be created in micro instruction register 106 for storing the micro instruction. One or more micro instructions in the micro instruction set are cached in the created micro instruction register entry, e.g., one micro instruction register entry may store 8 micro instructions. The micro instruction register 106 provides a determination of whether the micro instruction is already present in the micro instruction register 106 when the micro instruction is cached. For example, when a micro instruction is already present in the micro instruction register 106, the micro instruction register 106 may provide information about a cache hit (hit), and when a micro instruction is not present in the micro instruction register 106, the micro instruction register 106 may provide information about a cache Miss (Miss).
The controller in the processor determines whether to enable the micro instruction cache fetch mode (OC fetch mode) based on the information of the cache hit or cache miss provided by the micro instruction cache 106. In one embodiment, for example, when the micro instruction register 106 provides cache hit information in which a plurality of consecutive micro instruction groups are all present in the micro instruction register 106, the determination is yes, and the OC fetch mode is enabled; and when the judgment result is NO, the predicted information is continuously processed in the IC finger taking mode.
In response to enabling the OC fetch mode, the prediction information in the branch instruction prediction information queue 102 is sent to a micro instruction cache fetch queue contained in the micro instruction handling module 105 and to the micro instruction cache 106. The micro instruction register 106 determines whether a micro instruction set corresponding to the prediction information can be fetched from the micro instruction register 106 according to the address information in the prediction information. For example, in response to failing to fetch the microinstruction set corresponding to the predicted information, the system resumes processing the predicted information in IC fetch mode and processing the predicted information of the current miss in IC fetch mode.
In response to being able to fetch the set of micro instructions corresponding to the prediction information, the fetched set of micro instructions is sent to the micro instruction queue 107 to await dispatch. The micro instruction queue 107 sequentially directs groups of micro instructions processed from an Instruction Cache (IC) fetch mode or a micro instruction cache (OC) fetch mode to the dispatcher 108 for back-end execution, e.g., register renaming, execution, ending (Retire), etc.
For branch prediction techniques, information about the branch instruction may be deposited in a branch target buffer (Branch Target Buffer, BTB). In the current design, BTBs are mostly multi-way group connection structures, and instruction address information (or instruction addresses after hashing) is generally used as an index address. Meanwhile, instruction address information (or an instruction address after hash) is used as a tag for comparison to determine whether the instruction address hits in a certain entry (or data item) of the BTB.
The inventors of the present disclosure have noted that prediction systems, including, for example, an OC fetch mode (or fetch path), mostly rely on training of the micro instruction cache, i.e., by guessing the hit rate of the micro instruction cache, to predict whether the instruction fetch system needs to enter the OC fetch mode. For example, a microsystem corresponding to the micro instruction cache specification may be created in the instruction fetch module. Whether the system enters the OC finger mode is predicted by a lookup of the microsystem. Therefore, the design needs to consider the branch instruction information, for example, the microsystem search information is only needed when the current instruction is the jump target instruction of the branch instruction, and the OC instruction fetching mode is selected. In the process of training the micro instruction cache, most of the micro instruction information is stored in the micro instruction cache in a system instruction fetching mode under an IC instruction fetching path, namely, through an instruction decoding circuit.
In the above design, the fetch path prediction system requires additional maintenance of microsystem devices having the same specifications as the micro instruction cache. The device increases the chip area and also increases the power consumption of the system due to the reading and training of the microsystem.
In the training process of the micro instruction cache, the product expects to improve the hit rate of the micro instruction cache, so that all micro instruction information obtained in the IC instruction fetching mode is trained into the micro instruction cache. However, the entering of the OC fetch mode is largely dependent on the program itself. Some program segments' characteristics determine that it does not enter the OC fetch mode and therefore, there is no meaning to the microinstruction cache training of that program segment. Also, in this state, the training power consumption of the micro instruction cache may be relatively large and the efficiency of the OC fetch mode may be low. Moreover, since the instruction behavior of computer programs is often very complex, hardware systems are forced to adapt to and perceive the complex instruction behavior, which results in the need for hardware to maintain a complex system that is not friendly to all software program behaviors.
In view of the above problems, one or more embodiments of the present disclosure provide a program processing method and apparatus, an instruction processing method, a training method of an instruction fetching mode, a prediction method of an instruction fetching mode, and a processor, which adopt a soft-hard combination manner to improve the efficiency of the OC instruction fetching mode, and simultaneously reduce the power consumption of micro instruction cache training, and reduce the hardware design overhead and the manufacturing cost.
One or more embodiments of the present disclosure provide a program processing method including: creating an op_b uiild_start instruction at a first location in the target program; an op_buildsend instruction corresponding to the op_buildstart instruction is created at a second location in the object program that is located after the first location. Here, the op_b uild_start instruction is used to instruct the processor to train the instruction following the op_b uild_start instruction into the micro instruction cache, and the op_b uild_end instruction is used to notify the processor that the instruction following the op_b uild_end instruction is stopped from being trained into the micro instruction cache.
In the above embodiments, at least two micro instruction cache (OC) instruction fetch mode prediction instructions are introduced, including an op_buildstart instruction and an op_buildsnd instruction, the former instruction being used to instruct the START of OC instruction fetch mode training, and the latter instruction being used to instruct the END of OC instruction fetch mode training.
For example, the micro instruction cache fetch mode prediction instruction described above may be created in the instruction set, thereby providing a micro architecture that supports the micro instruction cache fetch mode prediction instruction and a processor having the micro architecture. For example, taking the X86 instruction set as an example, two different OPCODE encodings may be defined for defining the op_b uild_start instruction and the op_b ild_end instruction, respectively. Embodiments of the present disclosure are equally applicable to other types of instruction sets, such as ARM, RISC-V, MIPS, and the like.
For example, in at least one example, the op_b uiild_start instruction and the op_b uiild_end instruction are null instructions (NOPs), e.g., they are sent to a fixed point execution unit when executed, and do not do a specific operation; in other examples, one or more microinstruction cache fetch mode prediction registers are provided, which are modified by the op_b uild instruction when executed to indicate a current entering OC fetch mode prediction state (e.g., modifying the value of the register to 1), and which are modified by the op_b uild instruction when executed to indicate a current exiting OC fetch mode prediction state (e.g., modifying the value of the register to 0).
In the embodiment of the program processing method described above, there is no limitation on the form of the "target program" as a processing object, and it may be a computer program in machine language, assembly language, or high-level language. The op_buildstart instruction and the op_buildsnd instruction are inserted/created at predetermined positions in the target program, for example, by a compiler at the time of compiling the calculation program or by a manual operation of a programmer. For example, the op_b uild_start instruction and the op_b uild_end instruction occur in pairs, and a plurality of pairs of op_b uild_start instruction and op_b uild_end instruction spaced apart from each other may be included in the same object program, i.e., instruction sequences between the op_b uild_start instruction and the op_b uild_end instruction of different pairs do not overlap each other.
The op_buildstart instruction is set at a first location in the target program and the op_buildsnd instruction is set at a second location in the target program, the second location being subsequent to the first location. Thus, instructions located between the op_b uild instruction and the op_b uild instruction in the target program will be trained into the micro instruction cache.
In at least one embodiment of the present disclosure, the first location and the second location are determined based on a loop body in the target procedure. Instructions in the loop body may be executed more than once, so training instructions in the loop body into the micro instruction cache may improve instruction execution efficiency. The loop body (or loop structure) is a program structure provided in a program so that a certain function needs to be repeatedly executed, and can be regarded as a combination of one condition judgment statement and one backward rotation statement. The loop structure is typically composed of three elements, namely a loop variable, a loop body, and a loop termination condition. Three common circulation structures are for circulation, while circulation and do-while circulation, respectively. The while loop and the for loop are the first judgment expression and then the loop body is executed; the do-while loop is executed first and then the expression is judged, and details are not repeated here.
For example, in at least one embodiment of the present disclosure, the program processing method further includes: the object loop body is determined in the target program, the inlet position of the object loop body is taken as a first position, and the outlet position of the object loop body is taken as a second position. The "object loop body" herein is a loop body as a description object, and may be used to refer to any type of loop body.
For another example, in at least one embodiment of the present disclosure, the program processing method further includes: determining an object loop body in a target program; in response to the number of program instructions of the object loop body being greater than the storage capacity of the microinstruction cache, and in response to the difference (the number of program instructions of the object loop body minus (-) the number of program instructions stored by the microinstruction cache) reaching a threshold, determining a first location within the object loop body, and taking an exit location of the object loop body as a second location. Typically, the storage capacity of the micro instruction cache is fixed, so in some cases, the number of program instructions in the object loop body of the program may be greater than the storage capacity of the micro instruction cache, and thus the object loop body may not be completely cached in the micro instruction cache; when the number of program instructions of the object loop body exceeds a certain amount, a starting position (i.e., a first position) at which training is performed can be selected to be determined inside the object loop body. For example, the threshold may be set as desired.
For example, fig. 2 shows an example of some loop bodies in the target program, for example, there are only 1 branch instruction in the loop body in fig. 2 (1), only 2 branch instructions in the loop body in fig. 2 (2), only 3 branch instructions in the loop body in fig. 2 (3), and only n branch instructions in the loop body in fig. 2 (4) (n is greater than 3). The determination of the instruction loop body and the number of branch instructions included in the loop body are Not absolutely related, and the loop body needs to include a branch instruction whose branch jump direction is determined as jump (Taken, abbreviated as "T", corresponding to Not jump to Not Taken, abbreviated as "NT") and whose jump destination address is smaller than the address where the branch instruction is located; depending on the type of loop body (for loop, while loop, and do-while loop), the branch instruction may be at different locations within the loop body.
For example, in at least one embodiment of the present disclosure, a target program includes a plurality of loop bodies, e.g., including a first loop body and a second loop body, determining the object loop body in the target program includes: in response to the number of executions or instructions of the first loop body being greater than the number of executions or instructions of the second loop body in the target program, the first loop body is determined to be the object loop body.
In some cases, multiple (e.g., 2 or more) loop bodies are nested or interleaved, where it may be selected to train only one or a portion of the loop bodies, or to train all of the loop bodies, depending on the number of loops (also referred to as the HOT level) each loop body has during execution of the target program.
Fig. 3 shows a case where two loop bodies (i.e., loop body 1 and loop body 2) are nested in a target program. At this time, it is possible to select either just the loop body 1 or the loop body 2 or to train both loop bodies according to the HOT degree of the two loop bodies, for which case the first position and the second position for inserting the op_b uild_start instruction and the op_b uild instruction, respectively, are different.
The object program may be stored or transmitted after performing the processing of any of the above embodiments, and may be executed in, for example, a processor, and the embodiments of the present disclosure do not limit the type, structure, and the like of the processor.
Correspondingly, at least one embodiment of the present disclosure provides a program processing apparatus including a first creation module and a second creation module. The first creation module is configured to create an op_b uiild_start instruction at a first location in the target program; and a second creation module configured to create an op_buildsend instruction corresponding to the op_buildstart instruction at a second position located after the first position in the object program. For example, the first creation module and the second creation module may be implemented by software, firmware, hardware, or any combination thereof; the program processing methods described above may be implemented when they are at least partly implemented in software, when executed by a processor.
For example, in at least one embodiment of the present disclosure, the program processing apparatus includes an object loop body determination module configured to: the object loop body is determined in the target program, the inlet position of the object loop body is taken as a first position, and the outlet position of the object loop body is taken as a second position.
For another example, in at least one embodiment of the present disclosure, the object loop body is configured to: and determining a second position (inserting an OP_BUILD_END instruction into the object loop body) in the object loop body by taking the entry position of the object loop body as the first position in response to the program instruction number of the object loop body being larger than the storage capacity of the micro instruction cache and in response to the difference between the program instruction number of the object loop body and the program instruction number stored in the micro instruction cache reaching a threshold.
For example, in at least one embodiment of the present disclosure, the object loop body is further configured to: in response to the number of executions or instructions of the first loop body being greater than the number of executions or instructions of the second loop body in the target program, the first loop body is determined to be the object loop body.
At least one embodiment of the present disclosure provides an instruction processing method, including: training instructions subsequent to the op_b uiild_start instruction into the OC in response to identifying the op_b uiild_start instruction during execution of the target program; in response to identifying an op_b uild_end instruction corresponding to the op_b uild_start instruction during execution of the target program, instructions following the op_b uild_end instruction are stopped from being trained into the micro instruction cache.
As described above, instructions of the target program are fed into the processor for execution, and the processor typically processes the instructions in parallel using Pipeline (Pipeline) techniques to speed up instruction processing efficiency. In the execution process, if the processor identifies that the current op_b uild_start instruction needs to be executed, other instructions after the instruction need to be trained into a micro instruction cache, i.e. one or more micro instructions (for example, a micro instruction group) obtained after the instruction is decoded are cached into the micro instruction cache; and, if the processor recognizes that the op_b uild_end instruction needs to be executed currently during the execution, the above training process is not performed, i.e. the instruction after the op_b uild_end instruction stops being trained into the micro instruction cache. The instruction processing method described above is performed in the front end of a processor, for example.
For example, in at least one embodiment of the present disclosure, in the above instruction processing method, the object program includes an object loop body, an entrance position of the object loop body is a first position, and an exit position of the object loop body is a second position. Thus, instructions within the loop are trained into the micro instruction cache.
As described above, during execution of the target program, one or more instructions to be processed may be executed in IC or OC instruction fetch mode. In the process of the IC finger taking mode, determining whether to perform OC training according to the setting; during the execution of the OC instruction fetch mode, there is no decoding process for the instruction to be processed and the corresponding microinstruction is already cached in the OC, so that OC training is not required for the instruction to be processed. On the other hand, if the OC Miss (Miss) occurs, the processor exits the OC fetch mode and returns to the IC fetch mode, and at the same time, needs to determine whether the OC needs to be continuously trained. For example, when a SNOOP (SNOOP) operation occurs or a previous entry in the micro instruction cache is overwritten, an OC fetch may miss, thereby halting the OC fetch of the loop body.
For example, in at least one embodiment of the present disclosure, the instruction processing method further includes: entering an IC fetch in response to an OC fetch Miss (Miss) for a first object instruction following the op_b uild_start instruction, determining whether the first object instruction jumps out of the object loop body; and in response to the first object instruction not jumping out of the object loop body, continuing training the first object instruction and instructions after the first object instruction into the micro instruction cache. Subsequent instructions continue to be trained into the micro instruction cache until the op_b uild instruction is identified. For example, the determination of whether to jump out of the loop body may be by detecting whether the corresponding instruction is between an op_b uiild_start instruction/an op_b uiild_end instruction.
For example, in at least one embodiment of the present disclosure, the instruction processing method further includes: creating a first identifier in response to determining that the first object instruction does not jump out of the object loop body; after the first object instruction is decoded, the first object instruction and the instructions after the first object instruction are determined to be trained into the OC continuously according to the first identification.
For example, in at least one embodiment of the present disclosure, where a microinstruction cache fetch mode prediction register is provided in the processor, it may be determined whether OC training needs to be continued by determining the value of the current microinstruction cache fetch mode prediction register.
Fig. 4 illustrates a flow chart of an example instruction processing method in accordance with at least one embodiment of the present disclosure. As shown in fig. 4, the processing method includes steps 410-470 described below.
The instruction address obtained by the instruction fetch unit performs prediction through branch prediction to obtain an instruction address to be executed next (hereinafter referred to as a "current instruction address", and a corresponding instruction is referred to as a "current instruction"), where the current instruction may be located in a loop body that is marked by an op_b uid_start instruction and an op_b uid_end instruction and needs OC training, for example.
In step 410, the instruction fetch mode selection logic determines whether instruction decoding is required for the instruction corresponding to the current instruction address, proceeds to step 420 if the OC fetch mode is selected, and proceeds to step 440 if the IC fetch mode is selected.
At step 420, an OC fetch mode is entered, the instruction address is used to query the micro instruction cache for the presence of a corresponding micro instruction, if so, the query hits, the fetched micro instruction is sent to the micro instruction queue and issued, etc., otherwise the query is missing (Miss), and the process proceeds to step 430.
At step 430, it is determined whether the current instruction address (i.e., the current instruction) jumps out of the loop body and identifies whether it jumps out of the loop body, then return to step 410 and opt into IC fetch mode.
For example, the identification is an example of the first identification in the present disclosure.
At step 440, the IC fetch mode is entered, the instruction address is used to query the instruction cache for whether a current instruction is present, if so, the query hits, proceed to subsequent step 450, otherwise the query misses (Miss), the current instruction is fetched from the lower level cache or memory, and proceed to step 450.
At step 450, the current instruction is decoded to obtain one or more corresponding microinstructions.
At step 460, the current instruction is identified to confirm whether it is an op_b uiild_start instruction or an op_b uiild_end instruction, and whether the current instruction is identified as exiting the loop body.
In step 470, it is determined to start or end training of the OC based on the identification of step 460 and the identification (whether to exit the loop body).
For example, if the current instruction is an op_b uild_start instruction, then the OC training is started; if the current instruction is an OP_BUILD_END instruction, ending the OC training; if the current instruction is not an op_b uild_start instruction or an op_b uild_end instruction and the loop body has not been jumped out, then (re) OC training is started. Filling one or more micro instructions obtained by decoding the current instruction into a micro instruction cache, and performing subsequent execution steps on a pipeline.
In any of the embodiments described above, the OC fetch mode is trained, for example, the position entering the OC fetch mode is the jump position of the branch instruction, so that the information of the OC fetch mode can be trained into the table entry of the BTB corresponding to the branch instruction, so as to be used for predicting the OC fetch mode subsequently.
At least one embodiment of the present disclosure provides a training method for a finger fetch mode of a processor, the training method of the finger fetch mode including, in a process of executing an object program including an op_bus_start instruction and an op_bus_end instruction: acquiring branch instruction information of an object branch instruction immediately preceding the op_b uid_end instruction; determining whether the object branch instruction jumps according to the branch instruction information to determine whether to update the confidence value of the starting OC instruction fetching mode corresponding to the object branch instruction; the branch prediction buffer (BTB) entry corresponding to the subject branch instruction is read and the updated confidence value for the enabled OC fetch mode is written.
Here, "object branch instruction" is used to refer to a branch instruction that is the object of description, which is "immediately adjacent" to the op_b uid instruction in the instruction sequence of the target program, i.e., there are no other branch instructions between the object branch instruction and the op_b uid instruction.
For example, in at least one embodiment of the present disclosure, acquiring branch instruction information of a subject branch instruction immediately preceding an op_b uild instruction includes: when the op_b uild instruction reaches the release stage, the branch instruction information of the subject branch instruction is determined. For example, the branch instruction information includes a branch execution direction, a branch jump destination address, and the like.
For example, in at least one embodiment of the present disclosure, the object program includes an object loop body, an entrance position of the object loop body is a first position, and an exit position of the object loop body is a second position.
For example, in at least one embodiment of the present disclosure, the training method further comprises: responding that the instruction length of the loop body is smaller than or equal to the capacity of the micro instruction cache, and taking the branch instruction immediately before the OP_BUILD_END instruction in the micro instruction cache as an object branch instruction; or, in response to the instruction length of the loop body being greater than the capacity of the micro instruction cache, taking the branch instruction with the branch prediction target address being the instruction address of the op_b uild_start instruction as the target branch instruction when all instructions of the loop body are released.
In the above embodiment, for example, there may be two ways of fetching a branch instruction, respectively for the following two cases. In case 1, when the instruction length of the loop body is less than or equal to the capacity of the micro instruction cache, one instruction before the op_b uild instruction is a branch instruction, so that the branch instruction can be easily fetched. In case 2, when the instruction length of the loop body is greater than the capacity of the micro instruction cache, the instruction preceding the op_b uiild_end instruction has an increased probability of being a non-branch instruction, so that all instructions can be waited for to be released, a branch instruction whose branch prediction target address is the instruction address of the op_b uiild_start instruction can be found, and recorded so as to acquire the branch instruction. Subsequently, the instruction address of the op_b_end is used as an index to search the recorded BTB table entry, and if hit, the branch instruction can be obtained.
For example, in at least one embodiment of the present disclosure, a manner of determining whether a subject branch instruction jumps to determine whether to update a CONFIDENCE value (CONFIDENCE) of an enabled OC fetch mode corresponding to the subject branch instruction based on branch instruction information, comprises: when the execution direction of the object branch instruction is determined to be Not jump (Not Taken) according to the branch instruction information, determining to reduce the confidence value of starting the OC instruction fetching mode corresponding to the object branch instruction; when the execution direction of the target branch instruction is determined to be jump (Taken) according to the branch instruction information, a confidence value of starting the OC instruction fetching mode corresponding to the target branch instruction is determined to be increased. Here, a larger value of the confidence value indicates a larger probability of starting the OC finger mode, and conversely, a smaller probability of starting the OC finger mode.
For example, in at least one embodiment of the present disclosure, writing the updated confidence value for the start OC fetch mode includes: when the op_b uid_end instruction is released, an updated confidence value for the enabled OC fetch mode is written in a branch prediction buffer (BTB) entry corresponding to the subject branch instruction, e.g., in the Target address field of the BTB entry.
FIG. 5 illustrates a flow chart of an example finger-taking mode training method in accordance with at least one embodiment of the present disclosure. As shown in fig. 5, the processing method includes steps 510-580:
At step 510, the current instruction (or the micro instruction corresponding thereto) is obtained via the IC fetch mode or the OC fetch mode.
At step 520, the current instruction is decoded to obtain or retrieve the micro instruction from the micro instruction cache, which is provided to the micro instruction queue for issue.
At step 530, the op_b uiild_end instruction is identified, executed, and released.
As described above, for example, the OP_BUILD_END instruction/OP_BUILD_START instruction does not do a specific operation in the fixed point execution unit, i.e., it is considered a null instruction (NOP).
At step 540, it is determined whether an instruction preceding the op_b uiild_end instruction is a branch instruction, if so, the branch instruction is recorded as a critical branch instruction and proceeds to step 560, if not, proceeds to step 551.
In step 551, the instruction address of the branch instruction preceding the op_b uild instruction is recorded.
At step 552, a lookup is made in the BTB as to whether there is a BTB entry for the released branch instruction, and if so, proceed to step 553, otherwise proceed to step 554.
At step 553, the released branch instruction hits in the BTB branch instruction is recorded.
In step 554, a branch instruction in the release queue is fetched, and a comparison is made as to whether the jump address of the branch instruction is the instruction address corresponding to the op_b uild_start instruction.
In step 555, a branch instruction whose jump address is the instruction address corresponding to the op_b uild_start instruction is recorded.
At step 556, a determination is made as to whether the total size of the micro instruction cache can store the entire loop body.
At step 560, attribute information for the critical branch instruction is obtained and analyzed.
The jump direction, jump destination address, etc. of the branch instruction can be obtained from the attribute information.
In step 570, when it is determined that the execution direction of the target branch instruction is Not jump (Not take) according to the branch instruction information, the confidence value of the start OC fetch mode corresponding to the branch instruction is reduced; when the execution direction corresponding to the branch instruction is determined to be jump (Taken) according to the branch instruction information, the confidence value for starting the OC instruction fetching mode is increased.
In step 580, in the event that there is a "start OC fetch mode" training, an update of the BTB is started.
For example, the instruction address of a branch instruction is used to read the BTB entry corresponding to the branch instruction, increment/decrement the confidence value, and update to the Target (Target) location corresponding to the BTB entry.
At least one embodiment of the present disclosure provides a method for predicting a finger fetch mode, where the method for predicting a finger fetch mode includes: acquiring an object branch instruction in an object program comprising an OP_BUILD_START instruction and an OP_BUILD_END instruction; querying the target branch instruction in a branch prediction buffer to determine whether to branch predict the target branch instruction and whether to determine whether to enter the OC fetching mode for the target instruction, wherein the entry of the branch prediction buffer includes a confidence value for initiating the OC fetching path for predicting whether to enter the OC fetching mode.
For example, in a prediction method of a fetch mode of at least one embodiment of the present disclosure, in response to a branch prediction direction of a target branch instruction being a jump and a confidence value > set value of an OC fetch mode being started, after waiting for a branch target address of the target branch instruction to arrive, entering the OC fetch mode; and responding to the confidence value < = set value of the branch prediction direction of the target branch instruction being the non-jump or starting OC fetching mode, and keeping the current fetching mode after the arrival of the branch target address of the target branch instruction.
FIG. 6 illustrates a flow diagram of an example method of predicting a fetch pattern in accordance with at least one embodiment of the present disclosure; for example, the IC/OC fetch mode prediction function is incorporated into the branch prediction unit, and it is determined to enter either the IC fetch mode or the OC fetch mode while the branch prediction is being performed. As shown in fig. 6, the processing method includes steps 610-650:
at step 610, a fetch address is generated, resulting in a current instruction address.
At step 620, branch prediction is performed using the current instruction address to obtain branch prediction information (e.g., including branch prediction direction, branch target address, etc.) and a confidence value to initiate OC fetch mode, based on the branch prediction information, on the one hand, to obtain a new fetch address, and on the other hand, to proceed to step 630.
At step 630, it is determined whether to initiate IC fetch mode or OC fetch mode based on the branch prediction direction and the confidence value of the initiate OC fetch mode, and waits for the branch target address to arrive, if the former, proceed to step 640, otherwise proceed to step 650.
Specifically, if the branch prediction direction of the subject branch instruction is jump (Taken) and the confidence value of the enabled OC fetch mode > set point, then it is determined that the OC fetch mode is enabled; if the branch prediction direction of the subject branch instruction is a confidence value < = set value that does not jump or initiate OC fetch mode, then the current fetch mode is maintained, e.g., either IC fetch mode or OC fetch mode.
At step 640, the branch target address arrives, and the IC fetch mode is initiated.
At step 650, the branch target address arrives, and the OC fetch mode is enabled.
At least one embodiment of the present disclosure provides a processor comprising a decode unit and an OC, wherein the decode unit is configured to: in response to identifying an op_b uild_start instruction during execution of the target program, training instructions subsequent to the op_b uild_start instruction into the micro instruction cache, and in response to identifying an op_b uild_end instruction corresponding to the op_b uild_start instruction during execution of the target program, instructions subsequent to the op_b uild_end instruction cease to be trained into the micro instruction cache. Likewise, the op_b uiild_start instruction is at a first location in the target program, and the op_b uiild_end instruction is at a second location in the target program after the first location.
For example, in the processor of at least one embodiment of the present disclosure, the object program includes an object loop body, an entrance position of the object loop body is a first position, and an exit position of the object loop body is a second position.
For example, in the processor of at least one embodiment of the present disclosure, the decoding unit is further configured to: the method further includes entering an IC fetch in response to an OC fetch miss for a first object instruction following the OP_BUILD_START instruction, determining whether the first object instruction jumps out of the object loop body, and continuing training the first object instruction and instructions following the first object instruction into the micro-instruction cache in response to the first object instruction not jumping out of the object loop body.
For example, in the processor of at least one embodiment of the present disclosure, the decoding unit is further configured to: in response to determining that the first object instruction does not jump out of the object loop body, creating a first identification, and after decoding the first object instruction, determining to train the first object instruction and instructions subsequent to the first object instruction into the micro instruction cache based on the first identification.
For example, in at least one embodiment of the present disclosure, the processor described above further includes a branch prediction unit and a branch prediction buffer (BTB). The branch prediction unit is configured to: and acquiring the branch instruction information of the object branch instruction immediately before the OP_BUILD_END instruction, determining whether the object branch instruction jumps according to the branch instruction information to determine whether to update the confidence value of the starting OC instruction fetching mode corresponding to the object branch instruction, reading a branch prediction buffer (BTB) table entry corresponding to the object branch instruction, and writing the updated confidence value of the starting OC instruction fetching mode.
For example, in a processor of at least one embodiment of the present disclosure, the branch prediction unit is further configured to: when the op_b uild instruction reaches the release stage, the branch instruction information of the subject branch instruction is determined.
For example, in at least one embodiment of the present disclosure, the object includes an object loop body, an entry position of the object loop body is a first position, and an exit position of the object loop body is a second position. The branch prediction unit is further configured to: in response to the instruction length of the loop body being less than or equal to the capacity of the micro instruction cache, the branch instruction immediately preceding the op_b uild instruction in the micro instruction cache is the subject branch instruction, or in response to the instruction length of the loop body being greater than the capacity of the micro instruction cache, the branch instruction whose branch prediction target address is the instruction address of the op_b uild instruction is the subject branch instruction when all instructions of the loop body are released.
For example, in a processor of at least one embodiment of the present disclosure, the branch prediction unit is further configured to: when the execution direction of the object branch instruction is determined to be non-jump according to the branch instruction information, the confidence value of the start OC fetching mode is determined to be reduced, and when the execution direction of the object branch instruction is determined to be jump according to the branch instruction information, the confidence value of the start OC fetching mode is determined to be increased.
For example, in a processor of at least one embodiment of the present disclosure, the branch prediction unit is further configured to: when the execution direction of the object branch instruction is determined to be continuous N times without jumping according to the branch instruction information, the confidence value of the start OC fetching mode is determined to be reduced, and when the execution direction of the object branch instruction is determined to be continuous M times with jumping according to the branch instruction information, the confidence value of the start OC fetching mode is determined to be increased.
For example, in a processor of at least one embodiment of the present disclosure, the branch prediction unit is further configured to: when the op_b uid_end instruction is released, the updated confidence value for the start OC fetch mode is written in the branch prediction buffer entry corresponding to the subject branch instruction.
For example, in a processor of at least one embodiment of the present disclosure, the branch prediction unit is further configured to: the method comprises the steps of acquiring an object branch instruction in a target program comprising an OP_BUILD_START instruction and an OP_BUILD_END instruction, and querying the object branch instruction in a branch prediction buffer to determine whether to carry out branch prediction on the object branch instruction and whether to judge whether the object instruction enters an OC instruction fetching mode.
For example, in a processor of at least one embodiment of the present disclosure, in response to a branch prediction direction of a subject branch instruction being a jump and a confidence value > set value of an OC fetch mode being initiated, entering the OC fetch mode after waiting for a branch target address of the subject branch instruction to arrive; and responding to the confidence value < = set value of the branch prediction direction of the object branch instruction as the non-jump or starting OC fetching mode, and keeping the current fetching mode.
At least one embodiment of the present disclosure further provides a non-transitory readable storage medium, where the non-transitory readable storage medium stores computer instructions, where the computer instructions, when executed by a processor, implement a program processing method, an instruction processing method, a training method of an instruction fetch mode, and a prediction method of an instruction fetch mode as in any of the above embodiments.
At least one embodiment of the present disclosure is not limited in the type of instruction set or microarchitecture employed by the processor, such as CISC microarchitecture or RISC microarchitecture may be employed, such as X86-type microarchitecture, ARM-type microarchitecture, RISC-V-type microarchitecture, or the like.
At least some embodiments of the present disclosure also provide an electronic device comprising a processor of any one of the embodiments described above. Fig. 7 is a schematic block diagram of an electronic device provided in at least one embodiment of the present disclosure.
The electronic devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and stationary terminals such as digital TVs, desktop computers, and the like. The electronic device 1000 shown in fig. 7 is merely an example and should not be construed as limiting the functionality and scope of use of the disclosed embodiments.
For example, as shown in fig. 7, in some examples, an electronic device 1000 includes a processing device (e.g., central processing unit, graphics processor, etc.) 1001, which may include a processor of any of the above embodiments, which may perform various suitable actions and processes in accordance with a program stored in a Read Only Memory (ROM) 1002 or a program loaded from a storage device 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data required for the operation of the computer system are also stored. The processor 1001, ROM 1002, and RAM 1003 are connected thereto by a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.
For example, the following components may be connected to the I/O interface 1005: input devices 1006 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, and the like; an output device 1007 including a Liquid Crystal Display (LCD), speaker, vibrator, etc.; storage 1008 including, for example, magnetic tape, hard disk, etc.; for example, communication means 1009 may also include a network interface card such as a LAN card, modem, etc. The communication device 1009 may allow the electronic device 1000 to perform wireless or wired communication with other apparatuses to exchange data, performing communication processing via a network such as the internet. The drive 1010 is also connected to the I/O interface 1005 as needed. Removable media 1011, such as a magnetic disk, optical disk, magneto-optical disk, semiconductor memory, or the like, is mounted on drive 1010 as needed so that a computer program read therefrom is mounted to storage 1008 as needed. While fig. 7 illustrates an electronic device 1000 that includes various devices, it should be understood that not all illustrated devices are required to be implemented or included. More or fewer devices may be implemented or included instead.
For example, the electronic device 1000 may further include a peripheral interface (not shown), and the like. The peripheral interface may be various types of interfaces, such as a USB interface, a lightning (lighting) interface, etc. The communication means 1009 may communicate with a network, such as the internet, an intranet, and/or a wireless network, such as a cellular telephone network, a wireless Local Area Network (LAN), and/or a Metropolitan Area Network (MAN), and other devices via wireless communication. The wireless communication may use any of a variety of communication standards, protocols, and technologies including, but not limited to, global System for Mobile communications (GSM), enhanced Data GSM Environment (EDGE), wideband code division multiple Access (W-CDMA), code Division Multiple Access (CDMA), time Division Multiple Access (TDMA), bluetooth, wi-Fi (e.g., based on the IEEE 802.11a, IEEE 802.11b, IEEE 802.11g, and/or IEEE 802.11n standards), voice over Internet protocol (VoIP), wi-MAX, protocols for email, instant messaging, and/or Short Message Service (SMS), or any other suitable communication protocol.
For example, the electronic device 1000 may be any device such as a mobile phone, a tablet computer, a notebook computer, an electronic book, a game console, a television, a digital photo frame, a navigator, or any combination of a data processing device and hardware, which is not limited in the embodiments of the present disclosure.
While the disclosure has been described in detail with respect to the general description and the specific embodiments thereof, it will be apparent to those skilled in the art that certain modifications and improvements may be made thereto based on the embodiments of the disclosure. Accordingly, such modifications or improvements may be made without departing from the spirit of the disclosure and are intended to be within the scope of the disclosure as claimed.
For the purposes of this disclosure, the following points are also noted:
(1) The drawings of the embodiments of the present disclosure relate only to the structures related to the embodiments of the present disclosure, and other structures may refer to the general design.
(2) In the drawings for describing embodiments of the present disclosure, the thickness of layers or regions is exaggerated or reduced for clarity, i.e., the drawings are not drawn to actual scale.
(3) The embodiments of the present disclosure and features in the embodiments may be combined with each other to arrive at a new embodiment without conflict.
The foregoing is merely specific embodiments of the disclosure, but the scope of the disclosure is not limited thereto, and the scope of the disclosure should be determined by the claims.
Claims (21)
1. A program processing method comprising:
Creating an op_b uiild_start instruction at a first location in the target program;
creating an op_b uild_end instruction corresponding to the op_b uild_start instruction at a second location in the target program after the first location,
the OP_BUILD_START instruction is used for instructing a processor to train an instruction after the OP_BUILD_START instruction into a micro instruction cache, and the OP_BUILD_END instruction is used for informing the processor that the instruction after the OP_BUILD_END instruction stops being trained into the micro instruction cache.
2. The program processing method according to claim 1, further comprising:
an object loop body is determined in the object program,
and taking the inlet position of the object circulation body as the first position and the outlet position of the object circulation body as the second position.
3. The program processing method according to claim 2, wherein the target program includes a first loop body and a second loop body,
determining an object loop body in the target program, comprising:
and determining the first loop body as the object loop body in response to the number of times or the number of instructions executed in the first loop body being greater than the number of times or the number of instructions executed in the second loop body.
4. The program processing method according to claim 1, further comprising:
determining an object loop body in the target program;
and determining the second position in the object loop body by taking the entry position of the object loop body as the first position in response to the program instruction number of the object loop body being larger than the storage capacity of the micro instruction cache and in response to a difference value of the program instruction number of the object loop body minus the program instruction number stored in the micro instruction cache reaching a threshold value.
5. The program processing method according to any one of claims 1 to 4, wherein the op_b uiild_start instruction and the op_b uiild_end instruction are both null instructions.
6. The program processing method of any of claims 1-4, wherein a compiler is used to create the op_b uiild_start instruction at the first location and the op_b uiild_end instruction at the second location.
7. An instruction processing method, comprising:
training instructions subsequent to the op_b uiild_start instruction into a microinstruction cache in response to identifying the op_b uiild_start instruction during execution of the target program;
in response to identifying an op_b uild END instruction corresponding to the op_b ild_start instruction during execution of the target program, instructions subsequent to the op_b ild END instruction are stopped from being trained into the microinstruction cache, wherein the op_b ild_start instruction is at a first location in the target program and the op_b ild END instruction is at a second location in the target program subsequent to the first location.
8. The instruction processing method according to claim 7, wherein the object program includes an object loop body, an entrance position of the object loop body is the first position, and an exit position of the object loop body is the second position.
9. The instruction processing method of claim 8, further comprising:
entering an instruction cache fetch mode in response to a microinstruction cache fetch miss for a first object instruction following the op_b uild_start instruction, determining whether the first object instruction jumps out of the object loop body;
and in response to the first object instruction not jumping out of the object loop body, continuing training the first object instruction and instructions after the first object instruction into the micro instruction cache.
10. A training method for a fetch mode of a processor, in executing an object program including an op_b uiild_start instruction and an op_b uiild_end instruction, comprising:
acquiring branch instruction information of an object branch instruction immediately preceding the op_b uid_end instruction;
determining whether the target branch instruction jumps according to the branch instruction information to determine whether to update the confidence value of a starting micro instruction cache fetching mode corresponding to the target branch instruction;
Reading a branch prediction buffer table entry corresponding to the target branch instruction and writing in the updated confidence value of the starting micro instruction cache instruction fetching mode,
the processor is instructed to train the instruction after the op_buildstart instruction into a micro instruction cache, the op_buildstart instruction is used for notifying the processor that the instruction after the op_buildstart instruction stops being trained into the micro instruction cache, the op_buildstart instruction is at a first position of the target program, and the op_buildstart instruction is at a second position of the target program after the first position.
11. The training method of claim 10, wherein acquiring branch instruction information of a subject branch instruction immediately preceding the op_b uild instruction comprises:
when the op_b uiild_end instruction reaches a release stage, branch instruction information of the object branch instruction is determined.
12. The training method of claim 11, wherein the target program comprises a body of the object loop, an entrance position of the body of the object loop being the first position, and an exit position of the body of the object loop being the second position.
13. The training method of claim 12, further comprising:
in response to the instruction length of the loop body being less than or equal to the capacity of the micro instruction cache, taking a branch instruction immediately preceding the op_b uild instruction in the micro instruction cache as the subject branch instruction; or alternatively
And responding to the instruction length of the loop body to be larger than the capacity of the micro instruction cache, and taking a branch instruction with a branch prediction target address as an instruction address of the OP_BUILD_START instruction as the object branch instruction when all instructions of the loop body are released.
14. The training method of claim 10, wherein determining whether the subject branch instruction jumps to determine whether to update a confidence value of an initiate microinstruction cache fetch mode corresponding to the subject branch instruction based on the branch instruction information comprises:
when the execution direction of the target branch instruction is determined to be non-jump according to the branch instruction information, determining to reduce the confidence value of the starting micro instruction cache fetching mode;
and when the execution direction of the target branch instruction is determined to be jump according to the branch instruction information, determining to increase the confidence value of the starting micro instruction cache fetching mode.
15. The training method of claim 10, wherein writing updated confidence values for the start micro instruction cache fetch mode comprises:
and when the OP_BUILD_END instruction is released, writing the updated confidence value of the starting micro instruction cache instruction fetching mode in a branch prediction buffer table entry corresponding to the object branch instruction.
16. A method of predicting a finger fetch mode, comprising:
acquiring an object branch instruction in an object program comprising an OP_BUILD_START instruction and an OP_BUILD_END instruction;
querying the subject branch instruction in a branch prediction buffer to determine whether to branch predict the subject branch instruction and whether to determine whether to enter a microinstruction cache fetch mode for the subject instruction,
the entry of the branch prediction buffer includes a confidence value of a START micro instruction cache instruction fetch path for predicting whether to START a micro instruction cache instruction fetch mode, the op_b uild_start instruction is used for instructing a processor to train an instruction after the op_b uild_start instruction into a micro instruction cache, the op_b uild_end instruction is used for notifying the processor that the instruction after the op_b uild_end instruction stops being trained into the micro instruction cache, the op_b uild_start instruction is at a first position of the target program, and the op_b uild_end instruction is at a second position after the first position in the target program.
17. The prediction method according to claim 16, wherein,
responding to the branch prediction direction of the object branch instruction as a jump and starting a confidence value > set value of a micro instruction cache instruction fetching mode, and entering the micro instruction cache instruction fetching mode after waiting for the arrival of a branch target address of the object branch instruction;
and responding to the situation that the branch prediction direction of the object branch instruction is not jump or the confidence value < = the set value of the starting micro instruction cache fetching mode, and keeping the current fetching mode after the branch target address of the object branch instruction arrives.
18. A program processing apparatus comprising:
a first creation module configured to create an op_b uiild_start instruction at a first location in the target program;
a second creation module configured to create an op_b uild instruction corresponding to the op_b ild_start instruction at a second position located after the first position in the object,
the OP_BUILD_START instruction is used for instructing a processor to train an instruction after the OP_BUILD_START instruction into a micro instruction cache, and the OP_BUILD_END instruction is used for informing the processor that the instruction after the OP_BUILD_END instruction stops being trained into the micro instruction cache.
19. A processor includes a decode unit and a micro instruction cache,
wherein the decode unit is configured to train instructions subsequent to the op_b uild_start instruction into a micro instruction cache in response to identifying an op_b uild_start instruction during execution of the target program, and to stop training instructions subsequent to the op_b uild_end instruction into the micro instruction cache in response to identifying an op_b uild_end instruction corresponding to the op_b uild_start instruction during execution of the target program,
wherein the op_b uild_start instruction is at a first position of the target program, and the op_b uild_end instruction is at a second position of the target program after the first position.
20. The processor of claim 19, further comprising: a branch prediction unit and a branch prediction buffer, wherein the branch prediction unit is configured to: and acquiring the branch instruction information of an object branch instruction immediately before the OP_BUILD_END instruction, determining whether the object branch instruction jumps according to the branch instruction information to determine whether to update the confidence value of an initiation micro instruction cache instruction fetching mode corresponding to the object branch instruction, reading a branch prediction buffer table entry corresponding to the object branch instruction, and writing the updated confidence value of the initiation micro instruction cache instruction fetching mode.
21. The processor of claim 20, wherein the branch prediction unit is further configured to: the method comprises the steps of acquiring an object branch instruction in a target program comprising an OP_BUILD_START instruction and an OP_BUILD_END instruction, and querying the object branch instruction in a branch prediction buffer to determine whether to carry out branch prediction on the object branch instruction and whether to enter a micro instruction cache instruction fetching mode judgment on the object instruction.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311095296.5A CN117170747A (en) | 2023-08-28 | 2023-08-28 | Program and instruction processing, training and predicting method and device and processor |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311095296.5A CN117170747A (en) | 2023-08-28 | 2023-08-28 | Program and instruction processing, training and predicting method and device and processor |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117170747A true CN117170747A (en) | 2023-12-05 |
Family
ID=88936783
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311095296.5A Pending CN117170747A (en) | 2023-08-28 | 2023-08-28 | Program and instruction processing, training and predicting method and device and processor |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117170747A (en) |
-
2023
- 2023-08-28 CN CN202311095296.5A patent/CN117170747A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR101508566B1 (en) | Execute at commit state update instructions, apparatus, methods, and systems | |
EP1889152B1 (en) | A method and apparatus for predicting branch instructions | |
EP2864868B1 (en) | Methods and apparatus to extend software branch target hints | |
CN113590197B (en) | Configurable processor supporting variable length vector processing and implementation method thereof | |
US20120204008A1 (en) | Processor with a Hybrid Instruction Queue with Instruction Elaboration Between Sections | |
JP2013541758A (en) | Method and apparatus for reducing power consumption in a processor by reducing the power of an instruction fetch unit | |
CN109101276B (en) | Method for executing instruction in CPU | |
US9710269B2 (en) | Early conditional selection of an operand | |
JP2009524167A5 (en) | ||
CN116048627B (en) | Instruction buffering method, apparatus, processor, electronic device and readable storage medium | |
CN112559048B (en) | Instruction processing device, processor and processing method thereof | |
CN117170747A (en) | Program and instruction processing, training and predicting method and device and processor | |
JP2001060152A (en) | Information processor and information processing method capable of suppressing branch prediction | |
JP2014191663A (en) | Arithmetic processing unit, information processing unit and method for controlling arithmetic processing unit | |
CN115668142A (en) | Processor, processing method and related equipment | |
CN108628639B (en) | Processor and instruction scheduling method | |
EP2693333A1 (en) | Processor and instruction processing method thereof | |
US20140201505A1 (en) | Prediction-based thread selection in a multithreading processor | |
CN117055961B (en) | Scheduling method and scheduling device for multithreading and processor | |
JP2002182902A (en) | Memory data access structure and its method | |
EP2169540A1 (en) | Processing device | |
CN117348934A (en) | Data caching method, data caching device and processor | |
CN118245187A (en) | Thread scheduling method and device, electronic equipment and storage medium | |
CN114579264A (en) | Processing apparatus, processing system, and processing method | |
CN118132233A (en) | Thread scheduling method and device, processor and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |