[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN106066786A - Processor and processor operational approach - Google Patents

Processor and processor operational approach Download PDF

Info

Publication number
CN106066786A
CN106066786A CN201610361995.3A CN201610361995A CN106066786A CN 106066786 A CN106066786 A CN 106066786A CN 201610361995 A CN201610361995 A CN 201610361995A CN 106066786 A CN106066786 A CN 106066786A
Authority
CN
China
Prior art keywords
micro
mentioned
computing
processor
columns
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610361995.3A
Other languages
Chinese (zh)
Inventor
杨梦晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Zhaoxin Integrated Circuit Co Ltd
Original Assignee
Shanghai Zhaoxin Integrated Circuit Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Zhaoxin Integrated Circuit Co Ltd filed Critical Shanghai Zhaoxin Integrated Circuit Co Ltd
Priority to CN201610361995.3A priority Critical patent/CN106066786A/en
Publication of CN106066786A publication Critical patent/CN106066786A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/3017Runtime instruction translation, e.g. macros

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Executing Machine-Instructions (AREA)

Abstract

A kind of processor, including: microcode catcher, micro-operand store and performance element.Multiple micro-code instruction is detected and collected to microcode catcher.Micro-operand store stores multiple micro-computings, and each micro-code instruction is converted to micro-computing of the first quantity, and launches side by side and micro-computing of number of columns according to the state of indicating bit arranged side by side corresponding to each micro-computing.Performance element performs and micro-computing of number of columns side by side.

Description

Processor and processor operational approach
Technical field
The present invention relates to a kind of processor, particularly to the processor operational approach of a kind of execution micro-code instruction arranged side by side.
Background technology
The instruction that processor performs is divided into simple instruction (simple instruction) and micro-code instruction (microcode instruction).Simple instruction can be decoded into single micro-computing (micro-with decoded unit (decode unit) Operation, micro-op, uop) after disposably performed by performance element (execute unit).But, some processors make Performing specific program with micro-code instruction, a micro-code instruction may be defined as a complicated order, refers to as can not be simple It is decoded as the instruction of the instruction set architecture that single micro-computing is performed by performance element.Micro-code instruction can be via the computing of one or many Look-up table, is translated into a succession of micro-computing being stored in memorizer (e.g., read only memory).Above-mentioned " computing look-up table " also may be used It is referred to as " microcode (microcode) ", multiple micro-computing of correspondence can be found out with micro-code instruction for index.
An although micro-code instruction can translate to multiple micro-computing and perform, but some old microcode (such as legacy microcode Legal microcode) it is to aid in the processor old function of support or instruction, the microcode old due to these has been developed Becoming many years, the restriction of the hardware of the processor of adding over, for example, single-shot penetrate (single issue) processor, generally Each clock cycle can only be launched a micro-computing and be performed to rear end so that emission rate is low.Therefore, it is necessary to do not changing In the case of old microcode itself, promote the emissivity (issue rate) of old microcode.
Summary of the invention
In view of this, the present invention proposes a kind of processor, including: microcode catcher, micro-operand store and perform list Unit.Multiple micro-code instruction is detected and collected to above-mentioned microcode catcher;Above-mentioned micro-operand store stores multiple micro-computings, and will be every One above-mentioned micro-code instruction is converted to above-mentioned micro-computing of the first quantity, and according to corresponding the indicating side by side of each above-mentioned micro-computing The state of position and launch side by side and above-mentioned micro-computing of number of columns;Above-mentioned performance element performs the above-mentioned of above-mentioned and number of columns side by side Micro-computing.
The present invention also proposes a kind of processor operational approach, including: detect and collect multiple micro-code instruction;By above-mentioned microcode Instruction is converted to micro-computing of the first quantity;The state of the indicating bit arranged side by side according to each above-mentioned micro-computing, launches side by side side by side Above-mentioned micro-computing of quantity;And perform above-mentioned and above-mentioned micro-computing of number of columns side by side.
Accompanying drawing explanation
Fig. 1 is to show the block chart according to the processor described in one embodiment of the invention;
Fig. 2 is the schematic diagram showing the arranged side by side indicating bit corresponding according to the micro-computing described in one embodiment of the invention;
Fig. 3 is the block chart of the processor described in display according to another embodiment of the present invention;And
Fig. 4 is to show the flow chart according to the processor operational approach described in one embodiment of the invention.
Detailed description of the invention
Following description is embodiments of the invention.Its purpose is to illustrate the general principle of the present invention, should not regard For the restriction of the present invention, the scope of the present invention is when being as the criterion with the defined person of claim.
It should be noted that following disclosed content can provide the enforcement of multiple different characteristics in order to put into practice the present invention Example or example.The special assembly example of the following stated and arrangement are only in order to illustrate the spirit of the present invention in brief, not In order to limit the scope of the present invention.Additionally, description below may reuse in multiple examples identical element numbers or Word.But, reusable purpose simplifies only for providing and clearly illustrates, is not limited to multiple discussed further below Embodiment and/or configuration between relation.Additionally, a feature described in description below be connected to, be coupled to and/ Or it is formed at the first-class description of another feature, reality can comprise multiple different embodiment, directly contact including such feature, Or comprise other extra feature to be formed between such feature etc. so that such feature non-direct contact.
Fig. 1 is to show the block chart according to the processor described in one embodiment of the invention.As it is shown in figure 1, processor 100 Translate including instruction cache 110, microcode catcher 120, micro-operand store 130, performance element 140 and instruction Code unit 150.It is understood that the processor 100 of entirety can include other required assemblies again, simplify at this in order to describing this in detail The technical characteristic of invention.
Instruction cache 110, its speed buffering deposits the instruction set architecture of such as x86 instruction set architecture etc. Instruction, including simple instruction and micro-code instruction.According to one embodiment of the invention, a simple instruction can be by decoding unit 150 be directly translated into single micro-computing after directly performed by the performance element 140 of rear end, and micro-code instruction cannot be by decoding unit 150 directly translation after perform, it is necessary to first by micro-code instruction via microcode translate to correspondence a series of micro-computing just can be by performing Unit 140 performs.
According to one embodiment of the invention, simple instruction send after directly translating by decoding unit 150 via first path P1 To performance element 140;According to another embodiment of the present invention, micro-code instruction delivers to microcode catcher 120 via the second path P 2. Microcode catcher 120 takes from multiple micro-code instructions of instruction cache 110 in order to detect and to collect, and micro-computing stores Device 130 is in order to store multiple micro-computing, and each micro-code instruction is converted to a number of micro-computing, and according to each micro- The state of the indicating bit arranged side by side that computing is corresponding and launch side by side and micro-computing of number of columns is to performance element 140.Performance element 140 When receiving micro-computing of also number of columns, the micro-computing to also number of columns performs side by side.In one embodiment, microcode is collected Device 120 and micro-operand store 130 can also reside in another decoding unit arranged side by side with decoding unit 150.Noticeable It is that micro-computing is not directly launched and performed to performance element 140 by micro-operand store 130, and centre eliminates prior art Multiple processor pipeline levels, such as register alias table (RAT), resequencing buffer (ROB) and reservation station (Reservation Station) etc., do not repeat them here.
According to one embodiment of the invention, the present invention is directed to micro-interior micro-computing of every a line stored of operand store 130, Corresponding store indicating bit arranged side by side, its indicate each micro-computing whether can with its before micro-computing launch side by side.For example, one The indicating bit arranged side by side of the micro-computing of row if logical one, represent its can with its before micro-computing launch side by side, if patrolling Volume " 0 ", represent its can not with its before micro-computing launch side by side, certainly the invention is not restricted to this.Indicating bit can store side by side Position corresponding with the storage position of each self-corresponding micro-computing in micro-operand store 130;Due to old microcode inconvenience In amendment, in other embodiments, these indicating bits arranged side by side can be stored in bit memory (figure does not illustrates), and this bit memory can Outside micro-operand store 130, according to the putting in order of script of computing micro-in microcode, corresponding in this bit memory Store each indicating bit arranged side by side.Fig. 2 can describe the detail about indicating bit arranged side by side.
According to one embodiment of the invention, micro-operand store 130 also includes logic module 131, refers in order to detect microcode The state of the indicating bit arranged side by side of each micro-computing corresponding to order.Logic module 131 can by after microcode or be included in microcode end One section of code of tail realizes.Fig. 2 is to show the arranged side by side indicating bit corresponding according to the micro-computing described in one embodiment of the invention Schematic diagram.As in figure 2 it is shown, first micro-computing INST1 has the first indicating bit PI1 arranged side by side, second micro-computing INST2 has Two indicating bit PI2 arranged side by side, the micro-computing of N has N indicating bit arranged side by side PIN.
When a micro-code instruction is converted into first micro-computing INST1, second micro-computing by micro-operand store 130 During INST2 ... N micro-computing INSTN, logic module 131 detects the first indicating bit PI1 arranged side by side, the second indicating bit arranged side by side PI2 ... N indicating bit arranged side by side PIN, it is judged that first micro-computing INST1, second micro-computing INST2 ... the micro-computing of N Whether INSTN can perform side by side.For example, the second indicating bit PI2 arranged side by side is logical one, represents second micro-computing INST2 Can launch side by side with first micro-computing INST1;3rd indicating bit PI3 arranged side by side is logical one, represents the 3rd micro-computing INST3 energy Launch side by side with second micro-computing INST2;... the rest may be inferred, until there being the instruction arranged side by side of certain micro-computing (such as INSTM) Position (such as PIM) be logical zero, then it represents that INSTM can not with its before computing launch side by side, then logic module 131 judges Micro-computing INST1~INST (M-1) can be launched side by side.The generally general micro-computing in microcode, such as arithmetic logical operation (ALU), single instruction stream multiple data stream computing (SIMD) and access memory operations etc. can transmitted in parallel, and some are special Micro-computing, such as branch (branch), interruption (interrupt) and model related register (Model Specific Register, MSR) read and write (RDMSR/WRMSR) etc. and must launch in order.
According to one embodiment of the invention, when logic module detects should micro-computing of the first quantity of micro-code instruction In time to have the indicating bit arranged side by side of the second quantity (the most micro-computing INST1~INST (M-1)) be the first logic level, represent correspondence Micro-computing (the most micro-computing INST1~INST (M-1)) can perform side by side.According to one embodiment of the invention, first Logical bit according to the needs of design, and will definitely be logical zero or logical one.
In the present embodiment, micro-from the second quantity that indicating bit arranged side by side is the first logic level of micro-operand store 130 Computing is selected and micro-computing of number of columns is launched side by side to performance element 140 and performed side by side.According to the present invention one is real Execute example, and number of columns according to the emissivity of processor 100, (issue ratio, for example, 4/6/8issue i.e. can hold parallel 4/6/8 micro-computing of row) determined;It is, also number of columns is backend pipeline (the such as performance element of processor 100 140) depending on the quantity of micro-computing can be performed side by side.In the prior art, although the backend pipeline of processor 100 has many Launch the disposal ability of (multi-issue), but only support single-shot due to old microcode and penetrate (single issue), the most each Clock cycle can only be to one micro-computing of rear firing emission of processor 100, and the present invention is not changing old microcode (such as legacy Microcode legacy microcode) itself on the premise of, to the back-end realization multi-emitting of processor 100, make full use of process The back end bandwidth of device 100, improves instruction execution efficiency.
According to one embodiment of the invention, when micro-operand store 130 is by the first of computing micro-produced by micro-code instruction When quantity is more than also number of columns, namely when micro-operand store 130 micro-code instruction is according to the produced micro-computing of microcode conversion Quantity can perform side by side more than performance element 140 quantity time, micro-operand store 130 must repeat to turn micro-code instruction Change micro-computing into, until performance element 140 completes micro-computing that all micro-code instructions are corresponding.
For example, a micro-code instruction is converted into 100 micro-computings by micro-operand store 130, and performance element 140 institute The quantity that can perform side by side only has 4.Assuming that 100 micro-computings all can be launched side by side, the most micro-operand store 130 must profit This micro-code instruction just can be completed with 25 clock cycle.
Fig. 3 is the block chart of the processor described in display according to another embodiment of the present invention.As it is shown on figure 3, processor 300 include instruction cache 310, microcode catcher 320, micro-operand store 330, performance element 340, hold side by side Row buffer 350 and decoding unit 360, wherein instruction cache 310, microcode catcher 320, micro-computing storage Device 330, performance element 340 and decoding unit 360 are respectively corresponding to the instruction cache 110 of Fig. 1, microcode is collected Device 120, micro-operand store 130, performance element 140 and decoding unit 150.
Simple instruction and micro-code instruction produced by instruction cache 310, respectively via first path P1 And second path P 2 deliver to performance element 340.Processor 100 compared to Fig. 1, processor 300 also includes performing side by side to delay Storage 350, performs buffer 350 the most side by side and is configured to temporarily store produced by the corresponding micro-code instruction of micro-operand store 330 the (the second quantity) the part or all of micro-computing can launched side by side in micro-computing of one quantity, in one embodiment, side by side Perform buffer 350 also replace the logic module 131 of Fig. 1 to detect and judge the state of indicating bit arranged side by side corresponding to micro-computing. When micro-computing that execution buffer 350 arranged side by side detects the second quantity in micro-computing of the first quantity can be launched side by side, Micro-computing of the second quantity is all pushed and keeps in performing buffer side by side in a clock cycle by micro-operand store 330 In 350;In certain embodiments, if the memory capacity (the 3rd quantity) performing buffer 350 side by side is not enough to accommodate second Micro-computing of quantity, micro-computing of the 3rd quantity in the second quantity is pushed away by the most micro-operand store 330 in a clock cycle Send and keep in performing in buffer 350 side by side.
According to one embodiment of the invention, perform buffer 350 side by side and then kept in from it in following clock cycle Micro-computing in select and micro-computing of number of columns, launch to performance element 340 and perform side by side, and temporary remaining micro- Computing.According to another embodiment of the present invention, perform buffer 350 side by side also to select from remaining micro-computing in following clock cycle Going out above-mentioned and above-mentioned micro-computing of number of columns, transmitting to performance element 340 performs side by side.
For example, a micro-code instruction is converted into 200 micro-computings by micro-operand store 330, wherein has 100 micro-fortune Calculation can be launched side by side, and the quantity that performance element 340 each clock cycle can perform side by side only has 4.Assume also Row perform buffer 350 and be enough to temporary all of 100 micro-computings, utilize 25 clock cycle just although performing buffer 350 These 100 micro-computings all can be launched, but make microcode catcher 320 and micro-operand store 330 by this Article 100, micro-computing just can disengage after pushing to perform buffer 350 side by side, in order to process next micro-code instruction, and then enters one Step promotes the execution efficiency of processor 300.
According to one embodiment of the invention, send and micro-computing of number of columns is to performing list when performing buffer 350 side by side Unit 340 performs side by side buffer 350 when still having the micro-computing being not fully complete after performing side by side, perform buffer 350 side by side Again the performance element 340 of the micro-computing that can perform side by side transmitting to rear end can be performed in following clock cycle.
Fig. 4 is to show the flow chart according to the processor operational approach described in one embodiment of the invention.Below for Fig. 4 The narration of flow chart, will collocation Fig. 1, Fig. 3, in order to the technical characteristic describing the present invention in detail.
First, the multiple micro-of instruction cache 110,310 output is detected and collected to microcode catcher 120,320 Code instruction (step S1), i.e. picks out micro-code instruction from all instructions of instruction cache 110, and is referred to by microcode Order collects;Micro-operand store 130,330 is by each translation of the micro-code instruction stored by microcode catcher 120,320 It it is micro-computing (step S2) of the first quantity.Micro-operand store 130,330 is always according to the indicating bit arranged side by side of each micro-computing State, launches and micro-computing (step S3) of number of columns side by side.According to one embodiment of the invention, micro-operand store 130 is also Detect the state of indicating bit arranged side by side corresponding to each micro-computing and judge micro-fortune of the second quantity in micro-computing of the first quantity Calculation can perform side by side, and micro-operand store 130 is chosen in micro-computing of the second quantity and micro-computing of number of columns is sent out Penetrate.
Different from Fig. 3, according to another embodiment of the present invention, perform buffer 350 side by side and be configured to temporarily store micro-computing storage Micro-computing of the first quantity (such as 200) that device 330 produces, and detect the state of the indicating bit arranged side by side of micro-computing and judge In micro-computing of the first quantity (such as 200), micro-computing of the second quantity (such as 100) can perform side by side, then should Perform buffer 350 side by side to select in micro-computing of the second quantity (such as 100) and number of columns (example in each clock cycle Such as 4) micro-computing, and launch micro-computing of quantity arranged side by side.
Micro-fortune when the performance element 140,340 of processor rear end receives micro-computing of also number of columns, to also number of columns Calculation performs (step S4) side by side.According to one embodiment of the invention, and number of columns is according to the emissivity of processor 100,300 Determined;It is, also number of columns is micro-computing that backend pipeline (such as performance element 140,340) can perform side by side Depending on quantity.
According to another embodiment of the present invention, it is the second logic level and the second logical bit when the indicating bit arranged side by side of micro-computing When standard and the first logic level differ, representing this micro-computing cannot perform side by side with the micro-computing before it.Therefore, micro-computing Micro-computing that this indicating bit arranged side by side is the second logic level will individually be launched by memorizer 130,330 so that the performance element of rear end 140,340 this micro-computing is sequentially performed.
The above is the general introduction feature of embodiment.Having usually intellectual in art should be easy Utilize the present invention based on design or adjust carry out identical purpose and/or reach the identical excellent of embodiment described herein Point.Art has usually intellectual it will also be appreciated that identical configuration should not deviate from spirit and scope of the invention, Under without departing substantially from spirit and scope of the invention, they can make various change, replace and replace.Illustrative method only represents Exemplary step, but these steps are not necessarily to perform with represented order.Can it is possible to additionally incorporate, replace, change order And/or removal process is optionally to adjust and consistent with disclosed embodiment spirit and scope.

Claims (10)

1. a processor, it is characterised in that including:
Microcode catcher, detects and collects multiple micro-code instruction;
Micro-operand store, stores multiple micro-computing, and each above-mentioned micro-code instruction is converted to above-mentioned micro-fortune of the first quantity Calculate, and launch side by side according to the state of indicating bit arranged side by side corresponding to each above-mentioned micro-computing and above-mentioned micro-fortune of number of columns Calculate;And
Performance element, performs above-mentioned and above-mentioned micro-computing of number of columns side by side.
Processor the most according to claim 1, it is characterised in that also include:
Bit memory, corresponding each above-mentioned multiple micro-computings store above-mentioned indicating bit arranged side by side.
Processor the most according to claim 1, it is characterised in that each above-mentioned micro-computing of instruction of above-mentioned indicating bit arranged side by side is No can with its before micro-computing launch side by side.
Processor the most according to claim 1, it is characterised in that above-mentioned micro-operand store includes:
Logic module, detects the state of indicating bit above-mentioned arranged side by side corresponding to micro-computing of each above-mentioned first quantity, wherein when upper State logic module and detect the instruction above-mentioned arranged side by side that in micro-computing of above-mentioned first quantity, above-mentioned micro-computing of the second quantity is corresponding When position is the first logic level, above-mentioned logic module judges that above-mentioned micro-computing of above-mentioned second quantity can be launched side by side.
Processor the most according to claim 4, it is characterised in that when above-mentioned logic module judges the upper of above-mentioned second quantity Stating micro-computing when can launch side by side, above-mentioned micro-operand store selects above-mentioned and columns from above-mentioned micro-computing of above-mentioned second quantity Above-mentioned micro-computing of amount is launched side by side to above-mentioned performance element and is performed side by side.
Processor the most according to claim 1, it is characterised in that also include:
Performing buffer side by side, memory capacity is at most to store above-mentioned micro-computing of the 3rd quantity, when detecting above-mentioned first When in micro-computing of quantity, micro-computing of the second quantity can be launched side by side, above-mentioned micro-operand store is by above-mentioned second quantity Micro-computing of above-mentioned 3rd quantity in micro-computing pushes and keeps in above-mentioned execution buffer arranged side by side, and above-mentioned holds side by side Above-mentioned micro-computing from above-mentioned 3rd quantity is selected above-mentioned and above-mentioned micro-fortune of number of columns in a clock cycle by row buffer Calculate, launch extremely above-mentioned performance element and perform side by side.
Processor the most according to claim 1, it is characterised in that also include:
Perform buffer side by side, when in the micro-computing detecting above-mentioned first quantity, micro-computing of the second quantity can be launched side by side Time, micro-computing of above-mentioned second quantity is pushed and keeps in above-mentioned execution buffer arranged side by side by above-mentioned micro-operand store, and And above-mentioned execution buffer arranged side by side will select above-mentioned and columns from above-mentioned micro-computing of above-mentioned second quantity in a clock cycle Above-mentioned micro-computing of amount, launches extremely above-mentioned performance element and performs side by side.
8. a processor operational approach, it is characterised in that including:
Detect and collect multiple micro-code instruction;
Above-mentioned micro-code instruction is converted to micro-computing of the first quantity;
The state of the indicating bit arranged side by side according to each above-mentioned micro-computing, launches and above-mentioned micro-computing of number of columns side by side;And
Perform above-mentioned and above-mentioned micro-computing of number of columns side by side.
Processor operational approach the most according to claim 8, it is characterised in that also include:
Detect the state of indicating bit arranged side by side corresponding to each above-mentioned micro-computing;And
The above-mentioned arranged side by side indicating bit corresponding when above-mentioned micro-computing of the second quantity in the micro-computing detecting above-mentioned first quantity is During the first logic level, it is determined that above-mentioned micro-computing of above-mentioned second quantity can perform side by side.
Processor operational approach the most according to claim 9, it is characterised in that also include:
When the above-mentioned micro-computing judging above-mentioned second quantity can be launched side by side, select from above-mentioned micro-computing of above-mentioned second quantity Above-mentioned and number of columns above-mentioned micro-computing is launched side by side to performance element and is performed side by side.
CN201610361995.3A 2016-05-26 2016-05-26 Processor and processor operational approach Pending CN106066786A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610361995.3A CN106066786A (en) 2016-05-26 2016-05-26 Processor and processor operational approach

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610361995.3A CN106066786A (en) 2016-05-26 2016-05-26 Processor and processor operational approach

Publications (1)

Publication Number Publication Date
CN106066786A true CN106066786A (en) 2016-11-02

Family

ID=57420222

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610361995.3A Pending CN106066786A (en) 2016-05-26 2016-05-26 Processor and processor operational approach

Country Status (1)

Country Link
CN (1) CN106066786A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111258495A (en) * 2018-12-03 2020-06-09 三星电子株式会社 Semiconductor memory device and method of operating the same
CN111679856A (en) * 2020-06-15 2020-09-18 上海兆芯集成电路有限公司 High-performance complex instruction decoding microprocessor
WO2022174542A1 (en) * 2021-02-19 2022-08-25 华为技术有限公司 Data processing method and apparatus, processor, and computing device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1947092A (en) * 2004-04-22 2007-04-11 索尼计算机娱乐公司 Methods and apparatus for multi-processor pipeline parallelism
US20110276764A1 (en) * 2010-05-05 2011-11-10 International Business Machines Corporation Cracking destructively overlapping operands in variable length instructions
CN104572016A (en) * 2013-10-09 2015-04-29 Arm有限公司 Decoding a complex program instruction corresponding to multiple micro-operations
CN104937541A (en) * 2012-12-29 2015-09-23 英特尔公司 Apparatus and method for invocation of multi threaded accelerator

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1947092A (en) * 2004-04-22 2007-04-11 索尼计算机娱乐公司 Methods and apparatus for multi-processor pipeline parallelism
US20110276764A1 (en) * 2010-05-05 2011-11-10 International Business Machines Corporation Cracking destructively overlapping operands in variable length instructions
CN104937541A (en) * 2012-12-29 2015-09-23 英特尔公司 Apparatus and method for invocation of multi threaded accelerator
CN104572016A (en) * 2013-10-09 2015-04-29 Arm有限公司 Decoding a complex program instruction corresponding to multiple micro-operations

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111258495A (en) * 2018-12-03 2020-06-09 三星电子株式会社 Semiconductor memory device and method of operating the same
CN111679856A (en) * 2020-06-15 2020-09-18 上海兆芯集成电路有限公司 High-performance complex instruction decoding microprocessor
CN111679856B (en) * 2020-06-15 2023-09-08 上海兆芯集成电路股份有限公司 Microprocessor with high-efficiency complex instruction decoding
WO2022174542A1 (en) * 2021-02-19 2022-08-25 华为技术有限公司 Data processing method and apparatus, processor, and computing device

Similar Documents

Publication Publication Date Title
Vajapeyam et al. Improving superscalar instruction dispatch and issue by exploiting dynamic code sequences
JP3542020B2 (en) Processor device and processor control method for executing instruction cache processing for instruction fetch alignment over multiple predictive branch instructions
US8904153B2 (en) Vector loads with multiple vector elements from a same cache line in a scattered load operation
CN104040490B (en) Code optimizer for the acceleration of multi engine microprocessor
JPS5991546A (en) Central processing unit
EP2600240A2 (en) Predecode repair cache for instructions that cross an instruction cache line
US7958336B2 (en) System and method for reservation station load dependency matrix
CN102483696A (en) Methods and apparatus to predict non-execution of conditional non-branching instructions
KR20090061075A (en) Effective use of a bht in processor having variable length instruction set execution modes
TW201030606A (en) Optimizing performance of instructions based on sequence detection or information associated with the instructions
KR20110025188A (en) Utilization of a store buffer for error recovery on a store allocation cache miss
CN1494677A (en) Digital signal processing apparatus
CN101535951A (en) Methods and apparatus for recognizing a subroutine call
CN111352659A (en) Misprediction recovery apparatus and method for branch and fetch pipelines
CN106066786A (en) Processor and processor operational approach
CN105446777A (en) Speculation concurrent execution method for non-aligned loading instructions of cache rows
CN101189574B (en) Instruction memory unit and method of operation
CN112241288A (en) Dynamic control flow reunion point for detecting conditional branches in hardware
CN114721724A (en) RISC-V instruction set-based six-stage pipeline processor
CN101371223B (en) Early conditional selection of an operand
KR102635965B1 (en) Front end of microprocessor and computer-implemented method using the same
CN104025034A (en) Configurable reduced instruction set core
Michaud et al. An exploration of instruction fetch requirement in out-of-order superscalar processors
GB2317724A (en) Multiple instruction parallel issue/execution management system
CN116302106A (en) Apparatus, method, and system for facilitating improved bandwidth of branch prediction units

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20161102

RJ01 Rejection of invention patent application after publication