CN106066786A - Processor and processor operational approach - Google Patents
Processor and processor operational approach Download PDFInfo
- Publication number
- CN106066786A CN106066786A CN201610361995.3A CN201610361995A CN106066786A CN 106066786 A CN106066786 A CN 106066786A CN 201610361995 A CN201610361995 A CN 201610361995A CN 106066786 A CN106066786 A CN 106066786A
- Authority
- CN
- China
- Prior art keywords
- micro
- mentioned
- computing
- processor
- columns
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/3017—Runtime instruction translation, e.g. macros
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Executing Machine-Instructions (AREA)
Abstract
A kind of processor, including: microcode catcher, micro-operand store and performance element.Multiple micro-code instruction is detected and collected to microcode catcher.Micro-operand store stores multiple micro-computings, and each micro-code instruction is converted to micro-computing of the first quantity, and launches side by side and micro-computing of number of columns according to the state of indicating bit arranged side by side corresponding to each micro-computing.Performance element performs and micro-computing of number of columns side by side.
Description
Technical field
The present invention relates to a kind of processor, particularly to the processor operational approach of a kind of execution micro-code instruction arranged side by side.
Background technology
The instruction that processor performs is divided into simple instruction (simple instruction) and micro-code instruction (microcode
instruction).Simple instruction can be decoded into single micro-computing (micro-with decoded unit (decode unit)
Operation, micro-op, uop) after disposably performed by performance element (execute unit).But, some processors make
Performing specific program with micro-code instruction, a micro-code instruction may be defined as a complicated order, refers to as can not be simple
It is decoded as the instruction of the instruction set architecture that single micro-computing is performed by performance element.Micro-code instruction can be via the computing of one or many
Look-up table, is translated into a succession of micro-computing being stored in memorizer (e.g., read only memory).Above-mentioned " computing look-up table " also may be used
It is referred to as " microcode (microcode) ", multiple micro-computing of correspondence can be found out with micro-code instruction for index.
An although micro-code instruction can translate to multiple micro-computing and perform, but some old microcode (such as legacy microcode
Legal microcode) it is to aid in the processor old function of support or instruction, the microcode old due to these has been developed
Becoming many years, the restriction of the hardware of the processor of adding over, for example, single-shot penetrate (single issue) processor, generally
Each clock cycle can only be launched a micro-computing and be performed to rear end so that emission rate is low.Therefore, it is necessary to do not changing
In the case of old microcode itself, promote the emissivity (issue rate) of old microcode.
Summary of the invention
In view of this, the present invention proposes a kind of processor, including: microcode catcher, micro-operand store and perform list
Unit.Multiple micro-code instruction is detected and collected to above-mentioned microcode catcher;Above-mentioned micro-operand store stores multiple micro-computings, and will be every
One above-mentioned micro-code instruction is converted to above-mentioned micro-computing of the first quantity, and according to corresponding the indicating side by side of each above-mentioned micro-computing
The state of position and launch side by side and above-mentioned micro-computing of number of columns;Above-mentioned performance element performs the above-mentioned of above-mentioned and number of columns side by side
Micro-computing.
The present invention also proposes a kind of processor operational approach, including: detect and collect multiple micro-code instruction;By above-mentioned microcode
Instruction is converted to micro-computing of the first quantity;The state of the indicating bit arranged side by side according to each above-mentioned micro-computing, launches side by side side by side
Above-mentioned micro-computing of quantity;And perform above-mentioned and above-mentioned micro-computing of number of columns side by side.
Accompanying drawing explanation
Fig. 1 is to show the block chart according to the processor described in one embodiment of the invention;
Fig. 2 is the schematic diagram showing the arranged side by side indicating bit corresponding according to the micro-computing described in one embodiment of the invention;
Fig. 3 is the block chart of the processor described in display according to another embodiment of the present invention;And
Fig. 4 is to show the flow chart according to the processor operational approach described in one embodiment of the invention.
Detailed description of the invention
Following description is embodiments of the invention.Its purpose is to illustrate the general principle of the present invention, should not regard
For the restriction of the present invention, the scope of the present invention is when being as the criterion with the defined person of claim.
It should be noted that following disclosed content can provide the enforcement of multiple different characteristics in order to put into practice the present invention
Example or example.The special assembly example of the following stated and arrangement are only in order to illustrate the spirit of the present invention in brief, not
In order to limit the scope of the present invention.Additionally, description below may reuse in multiple examples identical element numbers or
Word.But, reusable purpose simplifies only for providing and clearly illustrates, is not limited to multiple discussed further below
Embodiment and/or configuration between relation.Additionally, a feature described in description below be connected to, be coupled to and/
Or it is formed at the first-class description of another feature, reality can comprise multiple different embodiment, directly contact including such feature,
Or comprise other extra feature to be formed between such feature etc. so that such feature non-direct contact.
Fig. 1 is to show the block chart according to the processor described in one embodiment of the invention.As it is shown in figure 1, processor 100
Translate including instruction cache 110, microcode catcher 120, micro-operand store 130, performance element 140 and instruction
Code unit 150.It is understood that the processor 100 of entirety can include other required assemblies again, simplify at this in order to describing this in detail
The technical characteristic of invention.
Instruction cache 110, its speed buffering deposits the instruction set architecture of such as x86 instruction set architecture etc.
Instruction, including simple instruction and micro-code instruction.According to one embodiment of the invention, a simple instruction can be by decoding unit
150 be directly translated into single micro-computing after directly performed by the performance element 140 of rear end, and micro-code instruction cannot be by decoding unit
150 directly translation after perform, it is necessary to first by micro-code instruction via microcode translate to correspondence a series of micro-computing just can be by performing
Unit 140 performs.
According to one embodiment of the invention, simple instruction send after directly translating by decoding unit 150 via first path P1
To performance element 140;According to another embodiment of the present invention, micro-code instruction delivers to microcode catcher 120 via the second path P 2.
Microcode catcher 120 takes from multiple micro-code instructions of instruction cache 110 in order to detect and to collect, and micro-computing stores
Device 130 is in order to store multiple micro-computing, and each micro-code instruction is converted to a number of micro-computing, and according to each micro-
The state of the indicating bit arranged side by side that computing is corresponding and launch side by side and micro-computing of number of columns is to performance element 140.Performance element 140
When receiving micro-computing of also number of columns, the micro-computing to also number of columns performs side by side.In one embodiment, microcode is collected
Device 120 and micro-operand store 130 can also reside in another decoding unit arranged side by side with decoding unit 150.Noticeable
It is that micro-computing is not directly launched and performed to performance element 140 by micro-operand store 130, and centre eliminates prior art
Multiple processor pipeline levels, such as register alias table (RAT), resequencing buffer (ROB) and reservation station
(Reservation Station) etc., do not repeat them here.
According to one embodiment of the invention, the present invention is directed to micro-interior micro-computing of every a line stored of operand store 130,
Corresponding store indicating bit arranged side by side, its indicate each micro-computing whether can with its before micro-computing launch side by side.For example, one
The indicating bit arranged side by side of the micro-computing of row if logical one, represent its can with its before micro-computing launch side by side, if patrolling
Volume " 0 ", represent its can not with its before micro-computing launch side by side, certainly the invention is not restricted to this.Indicating bit can store side by side
Position corresponding with the storage position of each self-corresponding micro-computing in micro-operand store 130;Due to old microcode inconvenience
In amendment, in other embodiments, these indicating bits arranged side by side can be stored in bit memory (figure does not illustrates), and this bit memory can
Outside micro-operand store 130, according to the putting in order of script of computing micro-in microcode, corresponding in this bit memory
Store each indicating bit arranged side by side.Fig. 2 can describe the detail about indicating bit arranged side by side.
According to one embodiment of the invention, micro-operand store 130 also includes logic module 131, refers in order to detect microcode
The state of the indicating bit arranged side by side of each micro-computing corresponding to order.Logic module 131 can by after microcode or be included in microcode end
One section of code of tail realizes.Fig. 2 is to show the arranged side by side indicating bit corresponding according to the micro-computing described in one embodiment of the invention
Schematic diagram.As in figure 2 it is shown, first micro-computing INST1 has the first indicating bit PI1 arranged side by side, second micro-computing INST2 has
Two indicating bit PI2 arranged side by side, the micro-computing of N has N indicating bit arranged side by side PIN.
When a micro-code instruction is converted into first micro-computing INST1, second micro-computing by micro-operand store 130
During INST2 ... N micro-computing INSTN, logic module 131 detects the first indicating bit PI1 arranged side by side, the second indicating bit arranged side by side
PI2 ... N indicating bit arranged side by side PIN, it is judged that first micro-computing INST1, second micro-computing INST2 ... the micro-computing of N
Whether INSTN can perform side by side.For example, the second indicating bit PI2 arranged side by side is logical one, represents second micro-computing INST2
Can launch side by side with first micro-computing INST1;3rd indicating bit PI3 arranged side by side is logical one, represents the 3rd micro-computing INST3 energy
Launch side by side with second micro-computing INST2;... the rest may be inferred, until there being the instruction arranged side by side of certain micro-computing (such as INSTM)
Position (such as PIM) be logical zero, then it represents that INSTM can not with its before computing launch side by side, then logic module 131 judges
Micro-computing INST1~INST (M-1) can be launched side by side.The generally general micro-computing in microcode, such as arithmetic logical operation
(ALU), single instruction stream multiple data stream computing (SIMD) and access memory operations etc. can transmitted in parallel, and some are special
Micro-computing, such as branch (branch), interruption (interrupt) and model related register (Model Specific
Register, MSR) read and write (RDMSR/WRMSR) etc. and must launch in order.
According to one embodiment of the invention, when logic module detects should micro-computing of the first quantity of micro-code instruction
In time to have the indicating bit arranged side by side of the second quantity (the most micro-computing INST1~INST (M-1)) be the first logic level, represent correspondence
Micro-computing (the most micro-computing INST1~INST (M-1)) can perform side by side.According to one embodiment of the invention, first
Logical bit according to the needs of design, and will definitely be logical zero or logical one.
In the present embodiment, micro-from the second quantity that indicating bit arranged side by side is the first logic level of micro-operand store 130
Computing is selected and micro-computing of number of columns is launched side by side to performance element 140 and performed side by side.According to the present invention one is real
Execute example, and number of columns according to the emissivity of processor 100, (issue ratio, for example, 4/6/8issue i.e. can hold parallel
4/6/8 micro-computing of row) determined;It is, also number of columns is backend pipeline (the such as performance element of processor 100
140) depending on the quantity of micro-computing can be performed side by side.In the prior art, although the backend pipeline of processor 100 has many
Launch the disposal ability of (multi-issue), but only support single-shot due to old microcode and penetrate (single issue), the most each
Clock cycle can only be to one micro-computing of rear firing emission of processor 100, and the present invention is not changing old microcode (such as legacy
Microcode legacy microcode) itself on the premise of, to the back-end realization multi-emitting of processor 100, make full use of process
The back end bandwidth of device 100, improves instruction execution efficiency.
According to one embodiment of the invention, when micro-operand store 130 is by the first of computing micro-produced by micro-code instruction
When quantity is more than also number of columns, namely when micro-operand store 130 micro-code instruction is according to the produced micro-computing of microcode conversion
Quantity can perform side by side more than performance element 140 quantity time, micro-operand store 130 must repeat to turn micro-code instruction
Change micro-computing into, until performance element 140 completes micro-computing that all micro-code instructions are corresponding.
For example, a micro-code instruction is converted into 100 micro-computings by micro-operand store 130, and performance element 140 institute
The quantity that can perform side by side only has 4.Assuming that 100 micro-computings all can be launched side by side, the most micro-operand store 130 must profit
This micro-code instruction just can be completed with 25 clock cycle.
Fig. 3 is the block chart of the processor described in display according to another embodiment of the present invention.As it is shown on figure 3, processor
300 include instruction cache 310, microcode catcher 320, micro-operand store 330, performance element 340, hold side by side
Row buffer 350 and decoding unit 360, wherein instruction cache 310, microcode catcher 320, micro-computing storage
Device 330, performance element 340 and decoding unit 360 are respectively corresponding to the instruction cache 110 of Fig. 1, microcode is collected
Device 120, micro-operand store 130, performance element 140 and decoding unit 150.
Simple instruction and micro-code instruction produced by instruction cache 310, respectively via first path P1
And second path P 2 deliver to performance element 340.Processor 100 compared to Fig. 1, processor 300 also includes performing side by side to delay
Storage 350, performs buffer 350 the most side by side and is configured to temporarily store produced by the corresponding micro-code instruction of micro-operand store 330 the
(the second quantity) the part or all of micro-computing can launched side by side in micro-computing of one quantity, in one embodiment, side by side
Perform buffer 350 also replace the logic module 131 of Fig. 1 to detect and judge the state of indicating bit arranged side by side corresponding to micro-computing.
When micro-computing that execution buffer 350 arranged side by side detects the second quantity in micro-computing of the first quantity can be launched side by side,
Micro-computing of the second quantity is all pushed and keeps in performing buffer side by side in a clock cycle by micro-operand store 330
In 350;In certain embodiments, if the memory capacity (the 3rd quantity) performing buffer 350 side by side is not enough to accommodate second
Micro-computing of quantity, micro-computing of the 3rd quantity in the second quantity is pushed away by the most micro-operand store 330 in a clock cycle
Send and keep in performing in buffer 350 side by side.
According to one embodiment of the invention, perform buffer 350 side by side and then kept in from it in following clock cycle
Micro-computing in select and micro-computing of number of columns, launch to performance element 340 and perform side by side, and temporary remaining micro-
Computing.According to another embodiment of the present invention, perform buffer 350 side by side also to select from remaining micro-computing in following clock cycle
Going out above-mentioned and above-mentioned micro-computing of number of columns, transmitting to performance element 340 performs side by side.
For example, a micro-code instruction is converted into 200 micro-computings by micro-operand store 330, wherein has 100 micro-fortune
Calculation can be launched side by side, and the quantity that performance element 340 each clock cycle can perform side by side only has 4.Assume also
Row perform buffer 350 and be enough to temporary all of 100 micro-computings, utilize 25 clock cycle just although performing buffer 350
These 100 micro-computings all can be launched, but make microcode catcher 320 and micro-operand store 330 by this
Article 100, micro-computing just can disengage after pushing to perform buffer 350 side by side, in order to process next micro-code instruction, and then enters one
Step promotes the execution efficiency of processor 300.
According to one embodiment of the invention, send and micro-computing of number of columns is to performing list when performing buffer 350 side by side
Unit 340 performs side by side buffer 350 when still having the micro-computing being not fully complete after performing side by side, perform buffer 350 side by side
Again the performance element 340 of the micro-computing that can perform side by side transmitting to rear end can be performed in following clock cycle.
Fig. 4 is to show the flow chart according to the processor operational approach described in one embodiment of the invention.Below for Fig. 4
The narration of flow chart, will collocation Fig. 1, Fig. 3, in order to the technical characteristic describing the present invention in detail.
First, the multiple micro-of instruction cache 110,310 output is detected and collected to microcode catcher 120,320
Code instruction (step S1), i.e. picks out micro-code instruction from all instructions of instruction cache 110, and is referred to by microcode
Order collects;Micro-operand store 130,330 is by each translation of the micro-code instruction stored by microcode catcher 120,320
It it is micro-computing (step S2) of the first quantity.Micro-operand store 130,330 is always according to the indicating bit arranged side by side of each micro-computing
State, launches and micro-computing (step S3) of number of columns side by side.According to one embodiment of the invention, micro-operand store 130 is also
Detect the state of indicating bit arranged side by side corresponding to each micro-computing and judge micro-fortune of the second quantity in micro-computing of the first quantity
Calculation can perform side by side, and micro-operand store 130 is chosen in micro-computing of the second quantity and micro-computing of number of columns is sent out
Penetrate.
Different from Fig. 3, according to another embodiment of the present invention, perform buffer 350 side by side and be configured to temporarily store micro-computing storage
Micro-computing of the first quantity (such as 200) that device 330 produces, and detect the state of the indicating bit arranged side by side of micro-computing and judge
In micro-computing of the first quantity (such as 200), micro-computing of the second quantity (such as 100) can perform side by side, then should
Perform buffer 350 side by side to select in micro-computing of the second quantity (such as 100) and number of columns (example in each clock cycle
Such as 4) micro-computing, and launch micro-computing of quantity arranged side by side.
Micro-fortune when the performance element 140,340 of processor rear end receives micro-computing of also number of columns, to also number of columns
Calculation performs (step S4) side by side.According to one embodiment of the invention, and number of columns is according to the emissivity of processor 100,300
Determined;It is, also number of columns is micro-computing that backend pipeline (such as performance element 140,340) can perform side by side
Depending on quantity.
According to another embodiment of the present invention, it is the second logic level and the second logical bit when the indicating bit arranged side by side of micro-computing
When standard and the first logic level differ, representing this micro-computing cannot perform side by side with the micro-computing before it.Therefore, micro-computing
Micro-computing that this indicating bit arranged side by side is the second logic level will individually be launched by memorizer 130,330 so that the performance element of rear end
140,340 this micro-computing is sequentially performed.
The above is the general introduction feature of embodiment.Having usually intellectual in art should be easy
Utilize the present invention based on design or adjust carry out identical purpose and/or reach the identical excellent of embodiment described herein
Point.Art has usually intellectual it will also be appreciated that identical configuration should not deviate from spirit and scope of the invention,
Under without departing substantially from spirit and scope of the invention, they can make various change, replace and replace.Illustrative method only represents
Exemplary step, but these steps are not necessarily to perform with represented order.Can it is possible to additionally incorporate, replace, change order
And/or removal process is optionally to adjust and consistent with disclosed embodiment spirit and scope.
Claims (10)
1. a processor, it is characterised in that including:
Microcode catcher, detects and collects multiple micro-code instruction;
Micro-operand store, stores multiple micro-computing, and each above-mentioned micro-code instruction is converted to above-mentioned micro-fortune of the first quantity
Calculate, and launch side by side according to the state of indicating bit arranged side by side corresponding to each above-mentioned micro-computing and above-mentioned micro-fortune of number of columns
Calculate;And
Performance element, performs above-mentioned and above-mentioned micro-computing of number of columns side by side.
Processor the most according to claim 1, it is characterised in that also include:
Bit memory, corresponding each above-mentioned multiple micro-computings store above-mentioned indicating bit arranged side by side.
Processor the most according to claim 1, it is characterised in that each above-mentioned micro-computing of instruction of above-mentioned indicating bit arranged side by side is
No can with its before micro-computing launch side by side.
Processor the most according to claim 1, it is characterised in that above-mentioned micro-operand store includes:
Logic module, detects the state of indicating bit above-mentioned arranged side by side corresponding to micro-computing of each above-mentioned first quantity, wherein when upper
State logic module and detect the instruction above-mentioned arranged side by side that in micro-computing of above-mentioned first quantity, above-mentioned micro-computing of the second quantity is corresponding
When position is the first logic level, above-mentioned logic module judges that above-mentioned micro-computing of above-mentioned second quantity can be launched side by side.
Processor the most according to claim 4, it is characterised in that when above-mentioned logic module judges the upper of above-mentioned second quantity
Stating micro-computing when can launch side by side, above-mentioned micro-operand store selects above-mentioned and columns from above-mentioned micro-computing of above-mentioned second quantity
Above-mentioned micro-computing of amount is launched side by side to above-mentioned performance element and is performed side by side.
Processor the most according to claim 1, it is characterised in that also include:
Performing buffer side by side, memory capacity is at most to store above-mentioned micro-computing of the 3rd quantity, when detecting above-mentioned first
When in micro-computing of quantity, micro-computing of the second quantity can be launched side by side, above-mentioned micro-operand store is by above-mentioned second quantity
Micro-computing of above-mentioned 3rd quantity in micro-computing pushes and keeps in above-mentioned execution buffer arranged side by side, and above-mentioned holds side by side
Above-mentioned micro-computing from above-mentioned 3rd quantity is selected above-mentioned and above-mentioned micro-fortune of number of columns in a clock cycle by row buffer
Calculate, launch extremely above-mentioned performance element and perform side by side.
Processor the most according to claim 1, it is characterised in that also include:
Perform buffer side by side, when in the micro-computing detecting above-mentioned first quantity, micro-computing of the second quantity can be launched side by side
Time, micro-computing of above-mentioned second quantity is pushed and keeps in above-mentioned execution buffer arranged side by side by above-mentioned micro-operand store, and
And above-mentioned execution buffer arranged side by side will select above-mentioned and columns from above-mentioned micro-computing of above-mentioned second quantity in a clock cycle
Above-mentioned micro-computing of amount, launches extremely above-mentioned performance element and performs side by side.
8. a processor operational approach, it is characterised in that including:
Detect and collect multiple micro-code instruction;
Above-mentioned micro-code instruction is converted to micro-computing of the first quantity;
The state of the indicating bit arranged side by side according to each above-mentioned micro-computing, launches and above-mentioned micro-computing of number of columns side by side;And
Perform above-mentioned and above-mentioned micro-computing of number of columns side by side.
Processor operational approach the most according to claim 8, it is characterised in that also include:
Detect the state of indicating bit arranged side by side corresponding to each above-mentioned micro-computing;And
The above-mentioned arranged side by side indicating bit corresponding when above-mentioned micro-computing of the second quantity in the micro-computing detecting above-mentioned first quantity is
During the first logic level, it is determined that above-mentioned micro-computing of above-mentioned second quantity can perform side by side.
Processor operational approach the most according to claim 9, it is characterised in that also include:
When the above-mentioned micro-computing judging above-mentioned second quantity can be launched side by side, select from above-mentioned micro-computing of above-mentioned second quantity
Above-mentioned and number of columns above-mentioned micro-computing is launched side by side to performance element and is performed side by side.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610361995.3A CN106066786A (en) | 2016-05-26 | 2016-05-26 | Processor and processor operational approach |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610361995.3A CN106066786A (en) | 2016-05-26 | 2016-05-26 | Processor and processor operational approach |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106066786A true CN106066786A (en) | 2016-11-02 |
Family
ID=57420222
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610361995.3A Pending CN106066786A (en) | 2016-05-26 | 2016-05-26 | Processor and processor operational approach |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106066786A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111258495A (en) * | 2018-12-03 | 2020-06-09 | 三星电子株式会社 | Semiconductor memory device and method of operating the same |
CN111679856A (en) * | 2020-06-15 | 2020-09-18 | 上海兆芯集成电路有限公司 | High-performance complex instruction decoding microprocessor |
WO2022174542A1 (en) * | 2021-02-19 | 2022-08-25 | 华为技术有限公司 | Data processing method and apparatus, processor, and computing device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1947092A (en) * | 2004-04-22 | 2007-04-11 | 索尼计算机娱乐公司 | Methods and apparatus for multi-processor pipeline parallelism |
US20110276764A1 (en) * | 2010-05-05 | 2011-11-10 | International Business Machines Corporation | Cracking destructively overlapping operands in variable length instructions |
CN104572016A (en) * | 2013-10-09 | 2015-04-29 | Arm有限公司 | Decoding a complex program instruction corresponding to multiple micro-operations |
CN104937541A (en) * | 2012-12-29 | 2015-09-23 | 英特尔公司 | Apparatus and method for invocation of multi threaded accelerator |
-
2016
- 2016-05-26 CN CN201610361995.3A patent/CN106066786A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1947092A (en) * | 2004-04-22 | 2007-04-11 | 索尼计算机娱乐公司 | Methods and apparatus for multi-processor pipeline parallelism |
US20110276764A1 (en) * | 2010-05-05 | 2011-11-10 | International Business Machines Corporation | Cracking destructively overlapping operands in variable length instructions |
CN104937541A (en) * | 2012-12-29 | 2015-09-23 | 英特尔公司 | Apparatus and method for invocation of multi threaded accelerator |
CN104572016A (en) * | 2013-10-09 | 2015-04-29 | Arm有限公司 | Decoding a complex program instruction corresponding to multiple micro-operations |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111258495A (en) * | 2018-12-03 | 2020-06-09 | 三星电子株式会社 | Semiconductor memory device and method of operating the same |
CN111679856A (en) * | 2020-06-15 | 2020-09-18 | 上海兆芯集成电路有限公司 | High-performance complex instruction decoding microprocessor |
CN111679856B (en) * | 2020-06-15 | 2023-09-08 | 上海兆芯集成电路股份有限公司 | Microprocessor with high-efficiency complex instruction decoding |
WO2022174542A1 (en) * | 2021-02-19 | 2022-08-25 | 华为技术有限公司 | Data processing method and apparatus, processor, and computing device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Vajapeyam et al. | Improving superscalar instruction dispatch and issue by exploiting dynamic code sequences | |
JP3542020B2 (en) | Processor device and processor control method for executing instruction cache processing for instruction fetch alignment over multiple predictive branch instructions | |
US8904153B2 (en) | Vector loads with multiple vector elements from a same cache line in a scattered load operation | |
CN104040490B (en) | Code optimizer for the acceleration of multi engine microprocessor | |
JPS5991546A (en) | Central processing unit | |
EP2600240A2 (en) | Predecode repair cache for instructions that cross an instruction cache line | |
US7958336B2 (en) | System and method for reservation station load dependency matrix | |
CN102483696A (en) | Methods and apparatus to predict non-execution of conditional non-branching instructions | |
KR20090061075A (en) | Effective use of a bht in processor having variable length instruction set execution modes | |
TW201030606A (en) | Optimizing performance of instructions based on sequence detection or information associated with the instructions | |
KR20110025188A (en) | Utilization of a store buffer for error recovery on a store allocation cache miss | |
CN1494677A (en) | Digital signal processing apparatus | |
CN101535951A (en) | Methods and apparatus for recognizing a subroutine call | |
CN111352659A (en) | Misprediction recovery apparatus and method for branch and fetch pipelines | |
CN106066786A (en) | Processor and processor operational approach | |
CN105446777A (en) | Speculation concurrent execution method for non-aligned loading instructions of cache rows | |
CN101189574B (en) | Instruction memory unit and method of operation | |
CN112241288A (en) | Dynamic control flow reunion point for detecting conditional branches in hardware | |
CN114721724A (en) | RISC-V instruction set-based six-stage pipeline processor | |
CN101371223B (en) | Early conditional selection of an operand | |
KR102635965B1 (en) | Front end of microprocessor and computer-implemented method using the same | |
CN104025034A (en) | Configurable reduced instruction set core | |
Michaud et al. | An exploration of instruction fetch requirement in out-of-order superscalar processors | |
GB2317724A (en) | Multiple instruction parallel issue/execution management system | |
CN116302106A (en) | Apparatus, method, and system for facilitating improved bandwidth of branch prediction units |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20161102 |
|
RJ01 | Rejection of invention patent application after publication |