US20120042152A1 - Elimination of read-after-write resource conflicts in a pipeline of a processor - Google Patents
Elimination of read-after-write resource conflicts in a pipeline of a processor Download PDFInfo
- Publication number
- US20120042152A1 US20120042152A1 US12/855,201 US85520110A US2012042152A1 US 20120042152 A1 US20120042152 A1 US 20120042152A1 US 85520110 A US85520110 A US 85520110A US 2012042152 A1 US2012042152 A1 US 2012042152A1
- Authority
- US
- United States
- Prior art keywords
- instruction
- read
- write
- resource
- pipeline
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000008030 elimination Effects 0.000 title description 3
- 238000003379 elimination reaction Methods 0.000 title description 3
- 238000000034 method Methods 0.000 claims description 31
- 230000004044 response Effects 0.000 claims description 4
- 238000010586 diagram Methods 0.000 description 15
- 230000007246 mechanism Effects 0.000 description 5
- 230000015654 memory Effects 0.000 description 5
- 238000001514 detection method Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 3
- 230000003466 anti-cipated effect Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- HJVCHYDYCYBBQX-HLTLHRPFSA-N (2s,3s,4e,6e,8s,9s)-3-amino-9-methoxy-2,6,8-trimethyl-10-phenyldeca-4,6-dienoic acid Chemical compound OC(=O)[C@@H](C)[C@@H](N)/C=C/C(/C)=C/[C@H](C)[C@@H](OC)CC1=CC=CC=C1 HJVCHYDYCYBBQX-HLTLHRPFSA-N 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- PBLZLIFKVPJDCO-UHFFFAOYSA-N omega-Aminododecanoic acid Natural products NCCCCCCCCCCCC(O)=O PBLZLIFKVPJDCO-UHFFFAOYSA-N 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3838—Dependency mechanisms, e.g. register scoreboarding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
Definitions
- the present invention relates to pipelined processors generally and, more particularly, to a method and/or apparatus for implementing an elimination of read-after-write resource conflicts in a pipeline of a processor.
- a read-after-write conflict occurs in a pipelined processor when an instruction writes to a register or a status bit, and a succeeding instruction attempts to read from the register or status bit before the register or the status bit has actually been updated.
- a typical read-after-write conflict occurs where an instruction I 2 attempts to read from a register R 1 before an instruction I 1 writes to the register R 1 :
- the instruction I 2 When the instruction I 2 reaches stage S 3 and is ready to read the register R 1 , the instruction I 1 is only located at stage S 4 .
- the register R 1 does not become available to read until a cycle after the instruction I 1 is executed and the data D 1 is written to the register R 1 . Therefore an interlock mechanism of the processor will stall the instruction I 2 for two cycles until the data D 1 is in the register R 1 and available to read.
- FIG. 1 a diagram illustrating a conventional single cycle stall is shown.
- the instructions in conflict are a write instruction I 1 (i.e., MOVE,L D 1 ,R 1 ) and a read instruction I 3 (i.e., MOVE,L (R 1 ),R 2 ).
- An additional instruction I 2 i.e., ADD D 4 ,D 5 ) exists between the instructions I 1 and I 2 .
- the instructions I 1 to I 3 progress through the early stages of the pipeline as normal in cycles 1 through 4 .
- the instruction I 3 reaches stage S 3 and is ready to read the register R 1 and the instruction I 1 writes to the register R 1 from stage S 5 .
- the interlock mechanism Since the register R 1 will not have the data D 1 calculated by the instruction I 1 stored within until a cycle after the write by the instruction I 1 , execution of the instruction I 3 is delayed until cycle 6 before reading from the register R 1 . Therefore, the interlock mechanism generates a single cycle stall of the instruction I 3 .
- the number of cycles for which a read instruction is stalled depends on what stage the register is being read from, at what stage the register is being written to, and the distance between the two instructions that cause the conflict.
- FIG. 2 a diagram illustrating a conventional three cycle stall is shown.
- the instructions in conflict are a write instruction I 1 (i.e., CMPEQ D 0 ,D 1 ) to a bit T and a read instruction I 4 (i.e., IFT ADDA R 6 ,R 7 ) of the bit T.
- Another write instruction I 3 i.e., CMPEQA R 0 ,R 1
- An instruction I 2 i.e., IFT ADD D 6 ,D 7 ,D 8
- I 1 and I 3 the instructions in conflict.
- cycle 4 the instruction I 4 reaches stage S 4 and the instruction I 1 reaches stage S 7 . Since the instruction I 1 does not write until stage S 9 in cycle 6 , a three-cycle stall is generated by the interlock mechanism. The three-cycle stall delays the instruction I 4 from reading the bit T until cycle 7 . The interlock mechanism does not account for the instruction I 3 writing to the register R 1 from stage S 3 in cycle 2 . The interlock mechanism generates the three-cycle stall regardless of the presence or absence of the write instruction I 3 .
- the present invention concerns an apparatus having a processor and a circuit.
- the processor generally has a pipeline.
- the circuit may be configured to (i) detect a first write instruction in the pipeline that writes to a resource, (ii) stall a read instruction in the pipeline where (a) a first read-after-write conflict exists between the first write instruction and the read instruction and (b) no other write instruction to the resource is scheduled between the first write instruction and the read instruction and (iii) not stall the read instruction due to the first read-after-write conflict where a second write instruction to the resource is scheduled between the first write instruction and the read instruction.
- the objects, features and advantages of the present invention include providing a method and/or apparatus for implementing an elimination of read-after-write resource conflicts in a pipeline of a processor that may (i) take into account intermediate instructions between conflicting instructions, (ii) influence access to resources, (iii) influence an update stage of the resources, (iv) eliminate read-after-write conflicts in certain cases, (v) reduce a number of stalls in a pipelined processor and/or (vi) improve performance of a pipelined processor.
- FIG. 1 is a diagram illustrating a conventional single cycle stall
- FIG. 2 is a diagram illustrating a conventional three cycle stall
- FIG. 3 is a block diagram of a device in accordance with a preferred embodiment of the present invention.
- FIG. 4 is a flow diagram of an example method for conflict detection
- FIG. 5 is a diagram illustrating an example flow of several instructions through several pipeline stages
- FIG. 6 is a diagram of an example implementation of a device and a compiler
- FIG. 7 is a diagram illustrating another example flow of several instructions through several pipeline stages.
- An interlocked pipeline of a processor generally detects and solves resource conflicts.
- Resource conflicts may happen in a processor when a resource (e.g., a register or a status bit) is read to and written from different pipeline stages.
- a read-after-write conflict may occur when a particular stage writes to a resource and, an earlier stage attempts to read from the same resource in accordance with a following instruction or instructions.
- a distance e.g., number of instructions
- a distance e.g., number of stages or cycles
- Some embodiments of the present invention may omit an anticipated read-after-write conflict when an additional write to the resource occurs between the read after the write. Therefore, the number of inserted stalls may be reduced. If the additional write does not cause another read-after-write conflict, the stalls may be eliminated.
- the device (or apparatus) 100 may implement a pipelined processor.
- the device 100 may implement an interlocked pipelined processor.
- the device 100 generally comprises a circuit (or module) 102 , a circuit (or module) 104 , a circuit (or module) 106 and a circuit (or module) 108 .
- the circuits 102 to 108 may represent modules and/or blocks that may be implemented as hardware, firmware, software, a combination of hardware, firmware and/or software, or other implementations.
- a signal (e.g., INSTRa) may be generated by the circuit 102 and presented to both the circuit 104 and the circuit 106 .
- the circuit 102 may also generate a signal (e.g., INSTRb) that is received by the circuit 104 .
- a signal (e.g., STALL) may be created by the circuit 104 and transferred to the circuit 106 .
- the circuit 106 and the circuit 108 may exchange a signal (e.g., DATA).
- the circuit 102 may implement an instruction memory circuit.
- the circuit 102 is generally operational to store instructions that are executed by the device 100 .
- a sequence of instructions may be presented to the circuits 104 and 106 in the signal INSTRa. Some instructions may be presented in the signal INSTRb to the circuit 104 .
- the circuit 104 generally implements a conflict detector circuit.
- the circuit 104 is generally operational to search for and detect resource conflicts among the instructions currently being executed and/or about to be executed. As instructions are loaded from the circuit 102 into the circuit 106 , a copy of each current instruction may be examined by the circuit 104 . Each current instruction may be compared against one or more other instructions received in the signal INSTRb. The other instructions are generally scheduled to follow the current instruction.
- the circuit 104 may be operational to detect read-after-write conflicts among the instructions. In certain cases, the read-after-write conflicts may be anticipated and later canceled as an intervening write may negate the conflict. For each conflict detected, the circuit 104 may generate corresponding information in the signal STALL. The information generally informs the circuit 106 how to overcome the conflict. For example, a read-after-write conflict may be overcome by stalling the read instruction until the written data has been updated and is ready to be read.
- the circuit 106 may implement a pipelined processor circuit having multiple stages.
- the circuit 106 is generally operational to process multiple instructions simultaneously, generally a different instruction in each of the stages.
- the instructions may be received from the circuit 102 through the signal INSTRa.
- Data generated and/or read by the executing instructions may be exchanged with the circuit 108 via the signal DATA.
- the circuit 106 may be operational to deal with resource conflicts based on the information received in the signal STALL. For example, the circuit 106 may stall a resource read operation in a particular stage for one or more cycles. The stalls generally allow a resource write operation sufficient time to write the data into the resource from a different pipeline stage of the circuit 106 .
- the circuit 108 may implement a data memory circuit.
- the circuit 108 is generally operational to buffer data for the circuit 106 via the signal DATA.
- the circuit 108 may be divided into individual bits, registers, blocks and/or pages. Data written to the circuit 108 from the circuit 106 may be available to the circuit 106 a clock cycle after being written. In some designs, part to all of the circuit 108 may be integrated into the circuit 106 .
- the method (or process) 120 generally comprises a step (or block) 122 , a step (or block) 124 , a step (or block) 126 , a step (or block) 128 , a step (or block) 130 , a step (or block) 132 , a step (or block) 134 , a step (or block) 136 and a step (or block) 138 .
- the method 120 may be implemented by the circuit 104 .
- the steps 122 to 138 may represent modules and/or blocks that may be implemented as hardware, firmware, software, a combination of hardware, firmware and/or software, or other implementations.
- the method 120 generally starts at the step 122 waiting for an instruction.
- the circuit 104 may detect a current write instruction in the signal INSTRa.
- the current write instruction generally writes to a resource (e.g., resource X) at a particular stage (e.g., stage N) of the circuit 106 .
- Stage N may be any stage in the pipeline.
- the current write instruction may be in a sequence of instructions stored in the circuit 102 .
- the current write instruction may or may not be the beginning of a read-after-write conflict, depending on the subsequently-scheduled instructions in the sequence.
- the circuit 104 may look at the next instruction in the sequence in the step 128 .
- the next instruction may be obtained from the circuit 102 via the signal INSTRb.
- a check may be performed in the step 130 to see if the next instruction is another write instruction. If the next instruction is (i) not a write to the resource X and/or (ii) not at a stage earlier than stage G (e.g., the NO block of step 130 ), the method 120 may proceed to step 132 .
- step 132 if the next instruction is all of (i) a read instruction (ii) from resource X at a stage (e.g., stage M) and (iii) stage M is earlier in the pipeline than stage G (e.g., the YES branch of step 132 ), the circuit 104 generally concludes that a read-after-write conflict exists between the current write instruction and the read instruction.
- stage M may be any stage in the pipeline.
- the circuit 104 may assert an interlock in step 134 for a total of G-M cycles.
- the circuit 104 may convey the interlock assertion to the circuit 106 in the signal STALL.
- the signal STALL generally commands the circuit 106 to stall the read instruction at the stage M for a calculated number of cycles.
- the stall may delay the read instruction long enough to allow the current write instruction to load (write, transfer, place) the data into the circuit 108 from stage N and the data to become available to read.
- a check may be performed in the step 138 to determine if more instructions should be examined or not. If the variable G has not reached zero (e.g., the NO branch of step 138 ), more instructions should be examined.
- the circuit 104 may obtain the next scheduled instruction from the circuit 102 via the signal. INSTRb. Thereafter, the method 120 may continue with the step 130 .
- the circuit 104 may conclude that no read-after-write conflict exists for the current write instruction at the stage N.
- the method 120 thus returns to the step 122 and waits for another instruction in the signal INSTRa.
- the circuit 104 may conclude that the current write instruction does not cause a read-after-write conflict. Therefore, the method 120 may return to the step 122 and wait for another instruction in the signal INSTRa. When the subsequent write instruction from step 130 appears in the signal INSTRa, the circuit 104 may being another a read-after-write conflict test based on the subsequent write instruction.
- Circuit 104 may run the method 120 independently for every executed instruction.
- the method 120 generally accounts for one or more write instructions between the current write instruction and a later read instruction. Consequently, a potential read-after-write conflict between the current write instruction and the later read instruction may be cancelled by an intermediate (e.g., subsequent) write instruction. Cancellation of the potential read-after-write resource conflict generally reduces the number of stalls performed by the circuit 106 . Therefore, a performance of the processor 106 may be increased. In some cases, however, a new read-after-write conflict may be detected between the intermediate write instruction and the later read instruction.
- the method 120 may account for nested read-after-write conflicts.
- a read-after-write conflict for the resource X may have nested within a read-after-write conflict for another resource (e.g., resource Y).
- a write to the resource Y may take place after the write to the resource X.
- a read from the resource Y may occur before the read from the resource X.
- the NO branch may be selected as the write is not to the resource X.
- the NO branch may be selected as the read is not from the resource X. Therefore, the read-after-write conflict for resource Y may have no impact of the read-after-write conflict for resource X.
- the method 120 may produce the same results as shown in FIG. 1 where a true read-after-write resource conflict exists.
- the instruction I 1 writes to a resource X from stage S 5 and the instruction I 3 reads the resource X from earlier stage S 3 .
- the instruction I 2 may be obtained by the circuit 104 in step 128 . Steps 130 and 132 may determine that instructions I 1 and I 2 do not cause a read-after-write conflict.
- the counter variable G may be decremented from 5 to 4 in step 136 and instruction I 3 obtained in step 128 .
- the circuit 104 may determine in the step 132 that the instructions I 1 and I 3 cause a read-after-write conflict.
- the instruction may write to the resource X from stage S 5 and the circuit 104 may generate a stall in step 134 that prevents instruction I 3 from reading the resource X until cycle 6 .
- FIG. 5 a diagram illustrating an example flow of several instructions through several pipeline stages of the circuit 106 is shown.
- the example generally illustrates the conditions shown in FIG. 2 where the instruction I 1 writes to a resource X from stage S 9 , the instruction I 2 reads the resource X at stage S 9 , the instruction I 3 writes to same resource X from stage S 3 and the instruction I 4 reads the resource X at stage S 4 .
- the counter variable G may be decremented from 9 to 8 the step 136 and the instruction I 3 may be obtained from the circuit 102 in the step 128 .
- the circuit 104 may detect that instruction I 3 is a write to the resource X at stage S 3 , which is before stage S 9 . Therefore, the circuit 104 may conclude in step 130 that instruction I 1 is not involved in any read-after-write conflicts for the resource X even though instruction I 4 would normally cause such a conflict.
- the device 140 may implement a pipelined processor. In some embodiments, the device 140 may implement an exposed pipelined processor.
- the device 140 generally comprises the circuit 102 , the circuit 106 and the circuit 108 .
- a circuit (or module) 142 may be in communication with the device 140 through a signal (e.g., INSTRc).
- the circuit 142 may represent a module and/or block that may be implemented as hardware, firmware, software, a combination of hardware, firmware and/or software, or other implementations.
- the circuit 142 may implement a compiler circuit with a conflict detection capability.
- the circuit 142 is generally operational to perform the method 120 and adjust the sequence of instructions accordingly to resolve conflicts before the instructions are stored in the circuit 102 .
- the conflicts may be read-before-write resource conflicts.
- the conflicts may be resolved by inserting no-operation (NOP) instructions into the instruction schedule in step 134 , rather than stalling the read instructions in the circuit 106 .
- NOP no-operation
- FIG. 7 a diagram illustrating another example flow of several instructions through several pipeline stages of the circuit 106 is shown.
- the example generally illustrates the same conditions as in FIG. 1 , where the instruction I 1 writes to a resource X from stage S 5 , and instruction I 3 reads from the resource X at stage S 3 .
- the instruction I 2 may be subsequently examined by the circuit 142 .
- Steps 130 and 132 generally determine that instruction I 2 is not involved in a read-before-write resource conflict with instruction I 1 . Therefore, step 136 may reduce the counter variable G from 5 to 4.
- Instruction I 3 may be subsequently examined by the circuit 142 .
- Step 132 may determine that a read-after-write conflict for resource X exists between instructions I 1 and I 3 .
- Step 134 may calculate that a single cycle delay of instruction I 3 may resolve the conflict.
- the circuit 142 may insert a NOP instruction between the instruction I 2 and 13 in the step 134 .
- NOP instruction As illustrated in FIG. 7 , when instruction I 1 writes to the resource X from stage S 5 in cycle 6 the instruction I 3 is still in stage S 2 . When instruction I 3 reaches stage S 3 in cycle 6 , the data written by instruction I 1 is ready and available without any further delay.
- the circuit 142 may respond to the instructions I 1 to I 4 similar to the response of the circuit 104 .
- the circuit 142 may determine that (i) no read-after-write conflict exists between the instructions I 1 and I 2 , (ii) instruction I 3 negates any further read-after-write conflicts for the instruction I 1 and (iii) no read-after-write conflict exists between the instructions I 3 and I 4 . Therefore, the circuit 142 does not insert any NOP instructions in the sequence of instructions I 1 to I 4 .
- FIGS. 3 , 4 and 6 may be implemented using one or more of a conventional general purpose processor, digital computer, microprocessor, microcontroller, RISC (reduced instruction set computer) processor, CISC (complex instruction set computer) processor, SIMD (single instruction multiple data) processor, signal processor, central processing unit (CPU), arithmetic logic unit (ALU), video digital signal processor (VDSP) and/or similar computational machines, programmed according to the teachings of the present specification, as will be apparent to those skilled in the relevant art(s).
- RISC reduced instruction set computer
- CISC complex instruction set computer
- SIMD single instruction multiple data
- signal processor central processing unit
- CPU central processing unit
- ALU arithmetic logic unit
- VDSP video digital signal processor
- the present invention may also be implemented by the preparation of ASICs (application specific integrated circuits), Platform ASICs, FPGAs (field programmable gate arrays), PLDs (programmable logic devices), CPLDs (complex programmable logic device), sea-of-gates, RFICs (radio frequency integrated circuits), ASSPs (application specific standard products) or by interconnecting an appropriate network of conventional component circuits, as is described herein, modifications of which will be readily apparent to those skilled in the art(s).
- ASICs application specific integrated circuits
- FPGAs field programmable gate arrays
- PLDs programmable logic devices
- CPLDs complex programmable logic device
- sea-of-gates RFICs (radio frequency integrated circuits)
- ASSPs application specific standard products
- the present invention thus may also include a computer product which may be a storage medium or media and/or a transmission medium or media including instructions which may be used to program a machine to perform one or more processes or methods in accordance with the present invention.
- a computer product which may be a storage medium or media and/or a transmission medium or media including instructions which may be used to program a machine to perform one or more processes or methods in accordance with the present invention.
- Execution of instructions contained in the computer product by the machine, along with operations of surrounding circuitry may transform input data into one or more files on the storage medium and/or one or more output signals representative of a physical object or substance, such as an audio and/or visual depiction.
- the storage medium may include, but is not limited to, any type of disk including floppy disk, hard drive, magnetic disk, optical disk, CD-ROM, DVD and magneto-optical disks and circuits such as ROMs (read-only memories), RAMS (random access memories), EPROMs (electronically programmable ROMs), EEPROMs (electronically erasable ROMs), UVPROM (ultra-violet erasable ROMs), Flash memory, magnetic cards, optical cards, and/or any type of media suitable for storing electronic instructions.
- ROMs read-only memories
- RAMS random access memories
- EPROMs electroly programmable ROMs
- EEPROMs electro-erasable ROMs
- UVPROM ultra-violet erasable ROMs
- Flash memory magnetic cards, optical cards, and/or any type of media suitable for storing electronic instructions.
- the elements of the invention may form part or all of one or more devices, units, components, systems, machines and/or apparatuses.
- the devices may include, but are not limited to, servers, workstations, storage array controllers, storage systems, personal computers, laptop computers, notebook computers, palm computers, personal digital assistants, portable electronic devices, battery powered devices, set-top boxes, encoders, decoders, transcoders, compressors, decompressors, pre-processors, post-processors, transmitters, receivers, transceivers, cipher circuits, cellular telephones, digital cameras, positioning and/or navigation systems, medical equipment, heads-up displays, wireless devices, audio recording, storage and/or playback devices, video recording, storage and/or playback devices, game platforms, peripherals and/or multi-chip modules.
- the elements of the invention may be implemented in other types of devices to meet the criteria of a particular application.
- the term “simultaneously” is meant to describe events that share some common time period but the term is not meant to be limited to events that begin at the same point in time, end at the same point in time, or have the same duration.
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Advance Control (AREA)
Abstract
Description
- The present invention relates to pipelined processors generally and, more particularly, to a method and/or apparatus for implementing an elimination of read-after-write resource conflicts in a pipeline of a processor.
- A read-after-write conflict occurs in a pipelined processor when an instruction writes to a register or a status bit, and a succeeding instruction attempts to read from the register or status bit before the register or the status bit has actually been updated. Consider the following examples based on a pipelined with 10 stages and the StarCore assembler language. A typical read-after-write conflict occurs where an instruction I2 attempts to read from a register R1 before an instruction I1 writes to the register R1:
- I1: MOVE,
L D 1, R1; Write R1 at stage S5 - I2: MOVE, L (R1), R2; Read R1 at stage S3
- When the instruction I2 reaches stage S3 and is ready to read the register R1, the instruction I1 is only located at stage S4. The register R1 does not become available to read until a cycle after the instruction I1 is executed and the data D1 is written to the register R1. Therefore an interlock mechanism of the processor will stall the instruction I2 for two cycles until the data D1 is in the register R1 and available to read.
- Referring to
FIG. 1 , a diagram illustrating a conventional single cycle stall is shown. In the example, the instructions in conflict are a write instruction I1 (i.e., MOVE,L D 1,R1) and a read instruction I3 (i.e., MOVE,L (R1),R2). An additional instruction I2 (i.e., ADD D4,D5) exists between the instructions I1 and I2. The instructions I1 to I3 progress through the early stages of the pipeline as normal incycles 1 through 4. Incycle 5, the instruction I3 reaches stage S3 and is ready to read the register R1 and the instruction I1 writes to the register R1 from stage S5. Since the register R1 will not have the data D1 calculated by the instruction I1 stored within until a cycle after the write by the instruction I1, execution of the instruction I3 is delayed untilcycle 6 before reading from the register R1. Therefore, the interlock mechanism generates a single cycle stall of the instruction I3. The number of cycles for which a read instruction is stalled depends on what stage the register is being read from, at what stage the register is being written to, and the distance between the two instructions that cause the conflict. - Referring to
FIG. 2 , a diagram illustrating a conventional three cycle stall is shown. In the example, the instructions in conflict are a write instruction I1 (i.e., CMPEQ D0,D1) to a bit T and a read instruction I4 (i.e., IFT ADDA R6,R7) of the bit T. Another write instruction I3 (i.e., CMPEQA R0,R1) that writes to the bit T resides between the instructions I1 and I4. An instruction I2 (i.e., IFT ADD D6,D7,D8) that reads the bit T is located between the instructions I1 and I3. - In
cycle 4, the instruction I4 reaches stage S4 and the instruction I1 reaches stage S7. Since the instruction I1 does not write until stage S9 incycle 6, a three-cycle stall is generated by the interlock mechanism. The three-cycle stall delays the instruction I4 from reading the bit T untilcycle 7. The interlock mechanism does not account for the instruction I3 writing to the register R1 from stage S3 incycle 2. The interlock mechanism generates the three-cycle stall regardless of the presence or absence of the write instruction I3. - The present invention concerns an apparatus having a processor and a circuit. The processor generally has a pipeline.
- The circuit may be configured to (i) detect a first write instruction in the pipeline that writes to a resource, (ii) stall a read instruction in the pipeline where (a) a first read-after-write conflict exists between the first write instruction and the read instruction and (b) no other write instruction to the resource is scheduled between the first write instruction and the read instruction and (iii) not stall the read instruction due to the first read-after-write conflict where a second write instruction to the resource is scheduled between the first write instruction and the read instruction.
- The objects, features and advantages of the present invention include providing a method and/or apparatus for implementing an elimination of read-after-write resource conflicts in a pipeline of a processor that may (i) take into account intermediate instructions between conflicting instructions, (ii) influence access to resources, (iii) influence an update stage of the resources, (iv) eliminate read-after-write conflicts in certain cases, (v) reduce a number of stalls in a pipelined processor and/or (vi) improve performance of a pipelined processor.
- These and other objects, features and advantages of the present invention will be apparent from the following detailed description and the appended claims and drawings in which:
-
FIG. 1 is a diagram illustrating a conventional single cycle stall; -
FIG. 2 is a diagram illustrating a conventional three cycle stall; -
FIG. 3 is a block diagram of a device in accordance with a preferred embodiment of the present invention; -
FIG. 4 is a flow diagram of an example method for conflict detection; -
FIG. 5 is a diagram illustrating an example flow of several instructions through several pipeline stages; -
FIG. 6 is a diagram of an example implementation of a device and a compiler; and -
FIG. 7 is a diagram illustrating another example flow of several instructions through several pipeline stages. - An interlocked pipeline of a processor generally detects and solves resource conflicts. Resource conflicts may happen in a processor when a resource (e.g., a register or a status bit) is read to and written from different pipeline stages. A read-after-write conflict may occur when a particular stage writes to a resource and, an earlier stage attempts to read from the same resource in accordance with a following instruction or instructions. In general, (i) a distance (e.g., number of instructions) between the read instruction and the write instruction and (ii) a distance (e.g., number of stages or cycles) between the writing stage and the reading stage in the pipeline generally define a number of interlock (e.g., stall) cycles that resolve the read-after-write conflict. Some embodiments of the present invention may omit an anticipated read-after-write conflict when an additional write to the resource occurs between the read after the write. Therefore, the number of inserted stalls may be reduced. If the additional write does not cause another read-after-write conflict, the stalls may be eliminated.
- Referring to
FIG. 3 , a block diagram of adevice 100 is shown in accordance with a preferred embodiment of the present invention. The device (or apparatus) 100 may implement a pipelined processor. In some embodiments, thedevice 100 may implement an interlocked pipelined processor. Thedevice 100 generally comprises a circuit (or module) 102, a circuit (or module) 104, a circuit (or module) 106 and a circuit (or module) 108. Thecircuits 102 to 108 may represent modules and/or blocks that may be implemented as hardware, firmware, software, a combination of hardware, firmware and/or software, or other implementations. - A signal (e.g., INSTRa) may be generated by the
circuit 102 and presented to both thecircuit 104 and thecircuit 106. Thecircuit 102 may also generate a signal (e.g., INSTRb) that is received by thecircuit 104. A signal (e.g., STALL) may be created by thecircuit 104 and transferred to thecircuit 106. Thecircuit 106 and thecircuit 108 may exchange a signal (e.g., DATA). - The
circuit 102 may implement an instruction memory circuit. Thecircuit 102 is generally operational to store instructions that are executed by thedevice 100. A sequence of instructions may be presented to thecircuits circuit 104. - The
circuit 104 generally implements a conflict detector circuit. Thecircuit 104 is generally operational to search for and detect resource conflicts among the instructions currently being executed and/or about to be executed. As instructions are loaded from thecircuit 102 into thecircuit 106, a copy of each current instruction may be examined by thecircuit 104. Each current instruction may be compared against one or more other instructions received in the signal INSTRb. The other instructions are generally scheduled to follow the current instruction. In some embodiments, thecircuit 104 may be operational to detect read-after-write conflicts among the instructions. In certain cases, the read-after-write conflicts may be anticipated and later canceled as an intervening write may negate the conflict. For each conflict detected, thecircuit 104 may generate corresponding information in the signal STALL. The information generally informs thecircuit 106 how to overcome the conflict. For example, a read-after-write conflict may be overcome by stalling the read instruction until the written data has been updated and is ready to be read. - The
circuit 106 may implement a pipelined processor circuit having multiple stages. Thecircuit 106 is generally operational to process multiple instructions simultaneously, generally a different instruction in each of the stages. The instructions may be received from thecircuit 102 through the signal INSTRa. Data generated and/or read by the executing instructions may be exchanged with thecircuit 108 via the signal DATA. Thecircuit 106 may be operational to deal with resource conflicts based on the information received in the signal STALL. For example, thecircuit 106 may stall a resource read operation in a particular stage for one or more cycles. The stalls generally allow a resource write operation sufficient time to write the data into the resource from a different pipeline stage of thecircuit 106. - The
circuit 108 may implement a data memory circuit. Thecircuit 108 is generally operational to buffer data for thecircuit 106 via the signal DATA. Thecircuit 108 may be divided into individual bits, registers, blocks and/or pages. Data written to thecircuit 108 from thecircuit 106 may be available to the circuit 106 a clock cycle after being written. In some designs, part to all of thecircuit 108 may be integrated into thecircuit 106. - Referring to
FIG. 4 , a flow diagram of anexample method 120 for conflict detection is shown. The method (or process) 120 generally comprises a step (or block) 122, a step (or block) 124, a step (or block) 126, a step (or block) 128, a step (or block) 130, a step (or block) 132, a step (or block) 134, a step (or block) 136 and a step (or block) 138. Themethod 120 may be implemented by thecircuit 104. Thesteps 122 to 138 may represent modules and/or blocks that may be implemented as hardware, firmware, software, a combination of hardware, firmware and/or software, or other implementations. - The
method 120 generally starts at thestep 122 waiting for an instruction. In thestep 124, thecircuit 104 may detect a current write instruction in the signal INSTRa. The current write instruction generally writes to a resource (e.g., resource X) at a particular stage (e.g., stage N) of thecircuit 106. Stage N may be any stage in the pipeline. The current write instruction may be in a sequence of instructions stored in thecircuit 102. Upon detection of the current write instruction, thecircuit 104 may set a counter variable (e.g., G) in thestep 126 to match the stage number N from which the write takes place (e.g., G=N). The current write instruction may or may not be the beginning of a read-after-write conflict, depending on the subsequently-scheduled instructions in the sequence. - The
circuit 104 may look at the next instruction in the sequence in thestep 128. The next instruction may be obtained from thecircuit 102 via the signal INSTRb. A check may be performed in thestep 130 to see if the next instruction is another write instruction. If the next instruction is (i) not a write to the resource X and/or (ii) not at a stage earlier than stage G (e.g., the NO block of step 130), themethod 120 may proceed to step 132. In thestep 132, if the next instruction is all of (i) a read instruction (ii) from resource X at a stage (e.g., stage M) and (iii) stage M is earlier in the pipeline than stage G (e.g., the YES branch of step 132), thecircuit 104 generally concludes that a read-after-write conflict exists between the current write instruction and the read instruction. The stage M may be any stage in the pipeline. - In response to detecting the read-after-write conflict, the
circuit 104 may assert an interlock instep 134 for a total of G-M cycles. Thecircuit 104 may convey the interlock assertion to thecircuit 106 in the signal STALL. The signal STALL generally commands thecircuit 106 to stall the read instruction at the stage M for a calculated number of cycles. The stall may delay the read instruction long enough to allow the current write instruction to load (write, transfer, place) the data into thecircuit 108 from stage N and the data to become available to read. - If the next instruction is (i) not a read instruction and/or (ii) not a read from stage M (e.g., the NO branch of step 132), the
circuit 104 may decrement the counter variable G (e.g., G=G−1) in thestep 136. A check may be performed in thestep 138 to determine if more instructions should be examined or not. If the variable G has not reached zero (e.g., the NO branch of step 138), more instructions should be examined. In thestep 128, thecircuit 104 may obtain the next scheduled instruction from thecircuit 102 via the signal. INSTRb. Thereafter, themethod 120 may continue with thestep 130. If the counter variable G has reached zero (e.g., the YES branch in the step 138), thecircuit 104 may conclude that no read-after-write conflict exists for the current write instruction at the stage N. Themethod 120 thus returns to thestep 122 and waits for another instruction in the signal INSTRa. - Returning to the
step 130, if the next instruction is (i) a subsequent write instruction to the resource X and (ii) occurs at an earlier stage than stage G (e.g., the YES branch of step 130), thecircuit 104 may conclude that the current write instruction does not cause a read-after-write conflict. Therefore, themethod 120 may return to thestep 122 and wait for another instruction in the signal INSTRa. When the subsequent write instruction fromstep 130 appears in the signal INSTRa, thecircuit 104 may being another a read-after-write conflict test based on the subsequent write instruction. -
Circuit 104 may run themethod 120 independently for every executed instruction. Themethod 120 generally accounts for one or more write instructions between the current write instruction and a later read instruction. Consequently, a potential read-after-write conflict between the current write instruction and the later read instruction may be cancelled by an intermediate (e.g., subsequent) write instruction. Cancellation of the potential read-after-write resource conflict generally reduces the number of stalls performed by thecircuit 106. Therefore, a performance of theprocessor 106 may be increased. In some cases, however, a new read-after-write conflict may be detected between the intermediate write instruction and the later read instruction. - By considering each executed instruction independently, the
method 120 may account for nested read-after-write conflicts. For example, a read-after-write conflict for the resource X may have nested within a read-after-write conflict for another resource (e.g., resource Y). In particular, a write to the resource Y may take place after the write to the resource X. A read from the resource Y may occur before the read from the resource X. When the write to resource Y is considered at thestep 130, the NO branch may be selected as the write is not to the resource X. When the read from resource Y is considered at thestep 132, the NO branch may be selected as the read is not from the resource X. Therefore, the read-after-write conflict for resource Y may have no impact of the read-after-write conflict for resource X. - The
method 120 may produce the same results as shown inFIG. 1 where a true read-after-write resource conflict exists. By way of example, consider the situation ofFIG. 1 where the instruction I1 writes to a resource X from stage S5 and the instruction I3 reads the resource X from earlier stage S3. As the instruction I1 is transferred from thecircuit 102 to thecircuit 106, thecircuit 104 may detect the write to resource X from stage S5 per step 124 (G=N=5). The instruction I2 may be obtained by thecircuit 104 instep 128.Steps - The counter variable G may be decremented from 5 to 4 in
step 136 and instruction I3 obtained instep 128. Thecircuit 104 may determine in thestep 132 that the instructions I1 and I3 cause a read-after-write conflict. Incycle 5, the instruction may write to the resource X from stage S5 and thecircuit 104 may generate a stall instep 134 that prevents instruction I3 from reading the resource X untilcycle 6. - Referring to
FIG. 5 , a diagram illustrating an example flow of several instructions through several pipeline stages of thecircuit 106 is shown. The example generally illustrates the conditions shown inFIG. 2 where the instruction I1 writes to a resource X from stage S9, the instruction I2 reads the resource X at stage S9, the instruction I3 writes to same resource X from stage S3 and the instruction I4 reads the resource X at stage S4. - When the instruction I1 is copied into the pipeline in the
circuit 106, thecircuit 104 may detect the write to resource X at the stage S9 per step 124 (G=N=9). Thecircuit 104 may obtain a copy of the instruction I2 in thestep 128.Steps cycle 7, which is after instruction I1 writes to the resource incycle 6. The counter variable G may be decremented from 9 to 8 thestep 136 and the instruction I3 may be obtained from thecircuit 102 in thestep 128. Thecircuit 104 may detect that instruction I3 is a write to the resource X at stage S3, which is before stage S9. Therefore, thecircuit 104 may conclude instep 130 that instruction I1 is not involved in any read-after-write conflicts for the resource X even though instruction I4 would normally cause such a conflict. - When the instruction I3 is transferred to the
circuit 106, thecircuit 104 may detect the write to the resource X at the stage S3 per step 124 (G=N=3).Steps cycle 4, which is well after instruction I3 writes to the resource X incycle 2. Due to the operations of thecircuit 104, thecircuit 106 may not be stalled for the three cycles as shown inFIG. 2 , resulting in better processing efficiency. - Referring to
FIG. 6 , a diagram of an example implementation of adevice 140 is shown. The device (or apparatus) 140 may implement a pipelined processor. In some embodiments, thedevice 140 may implement an exposed pipelined processor. Thedevice 140 generally comprises thecircuit 102, thecircuit 106 and thecircuit 108. A circuit (or module) 142 may be in communication with thedevice 140 through a signal (e.g., INSTRc). Thecircuit 142 may represent a module and/or block that may be implemented as hardware, firmware, software, a combination of hardware, firmware and/or software, or other implementations. - The
circuit 142 may implement a compiler circuit with a conflict detection capability. Thecircuit 142 is generally operational to perform themethod 120 and adjust the sequence of instructions accordingly to resolve conflicts before the instructions are stored in thecircuit 102. In some embodiments, the conflicts may be read-before-write resource conflicts. The conflicts may be resolved by inserting no-operation (NOP) instructions into the instruction schedule instep 134, rather than stalling the read instructions in thecircuit 106. - Referring to
FIG. 7 , a diagram illustrating another example flow of several instructions through several pipeline stages of thecircuit 106 is shown. The example generally illustrates the same conditions as inFIG. 1 , where the instruction I1 writes to a resource X from stage S5, and instruction I3 reads from the resource X at stage S3. - When instruction I1 is detected by the
circuit 142 perstep 124, thecircuit 142 may detect a write to the resource X from the stage S5 (G=N=5). The instruction I2 may be subsequently examined by thecircuit 142.Steps circuit 142. Step 132 may determine that a read-after-write conflict for resource X exists between instructions I1 and I3. Step 134 may calculate that a single cycle delay of instruction I3 may resolve the conflict. Therefore, thecircuit 142 may insert a NOP instruction between the instruction I2 and 13 in thestep 134. As illustrated inFIG. 7 , when instruction I1 writes to the resource X from stage S5 incycle 6 the instruction I3 is still in stage S2. When instruction I3 reaches stage S3 incycle 6, the data written by instruction I1 is ready and available without any further delay. - Where the
circuit 142 encounters the situation ofFIG. 2 , the results may be the same as illustrated inFIG. 5 . The situation ofFIG. 2 has instruction I1 writing to the resource X from stage S9, instruction I2 reading the resource X from stage S9, instruction I3 writing to the resource X from stage 53 and instruction I4 reading the resource X from stage 54. Thecircuit 142 may respond to the instructions I1 to I4 similar to the response of thecircuit 104. By following themethod 120, thecircuit 142 may determine that (i) no read-after-write conflict exists between the instructions I1 and I2, (ii) instruction I3 negates any further read-after-write conflicts for the instruction I1 and (iii) no read-after-write conflict exists between the instructions I3 and I4. Therefore, thecircuit 142 does not insert any NOP instructions in the sequence of instructions I1 to I4. - The functions performed by the diagrams of
FIGS. 3 , 4 and 6 may be implemented using one or more of a conventional general purpose processor, digital computer, microprocessor, microcontroller, RISC (reduced instruction set computer) processor, CISC (complex instruction set computer) processor, SIMD (single instruction multiple data) processor, signal processor, central processing unit (CPU), arithmetic logic unit (ALU), video digital signal processor (VDSP) and/or similar computational machines, programmed according to the teachings of the present specification, as will be apparent to those skilled in the relevant art(s). Appropriate software, firmware, coding, routines, instructions, opcodes, microcode, and/or program modules may readily be prepared by skilled programmers based on the teachings of the present disclosure, as will also be apparent to those skilled in the relevant art(s). The software is generally executed from a medium or several media by one or more of the processors of the machine implementation. - The present invention may also be implemented by the preparation of ASICs (application specific integrated circuits), Platform ASICs, FPGAs (field programmable gate arrays), PLDs (programmable logic devices), CPLDs (complex programmable logic device), sea-of-gates, RFICs (radio frequency integrated circuits), ASSPs (application specific standard products) or by interconnecting an appropriate network of conventional component circuits, as is described herein, modifications of which will be readily apparent to those skilled in the art(s).
- The present invention thus may also include a computer product which may be a storage medium or media and/or a transmission medium or media including instructions which may be used to program a machine to perform one or more processes or methods in accordance with the present invention. Execution of instructions contained in the computer product by the machine, along with operations of surrounding circuitry, may transform input data into one or more files on the storage medium and/or one or more output signals representative of a physical object or substance, such as an audio and/or visual depiction. The storage medium may include, but is not limited to, any type of disk including floppy disk, hard drive, magnetic disk, optical disk, CD-ROM, DVD and magneto-optical disks and circuits such as ROMs (read-only memories), RAMS (random access memories), EPROMs (electronically programmable ROMs), EEPROMs (electronically erasable ROMs), UVPROM (ultra-violet erasable ROMs), Flash memory, magnetic cards, optical cards, and/or any type of media suitable for storing electronic instructions.
- The elements of the invention may form part or all of one or more devices, units, components, systems, machines and/or apparatuses. The devices may include, but are not limited to, servers, workstations, storage array controllers, storage systems, personal computers, laptop computers, notebook computers, palm computers, personal digital assistants, portable electronic devices, battery powered devices, set-top boxes, encoders, decoders, transcoders, compressors, decompressors, pre-processors, post-processors, transmitters, receivers, transceivers, cipher circuits, cellular telephones, digital cameras, positioning and/or navigation systems, medical equipment, heads-up displays, wireless devices, audio recording, storage and/or playback devices, video recording, storage and/or playback devices, game platforms, peripherals and/or multi-chip modules. Those skilled in the relevant art(s) would understand that the elements of the invention may be implemented in other types of devices to meet the criteria of a particular application. As used herein, the term “simultaneously” is meant to describe events that share some common time period but the term is not meant to be limited to events that begin at the same point in time, end at the same point in time, or have the same duration.
- While the invention has been particularly shown and described with reference to the preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made without departing from the scope of the invention.
Claims (20)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/855,201 US8499139B2 (en) | 2010-08-12 | 2010-08-12 | Avoiding stall in processor pipeline upon read after write resource conflict when intervening write present |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/855,201 US8499139B2 (en) | 2010-08-12 | 2010-08-12 | Avoiding stall in processor pipeline upon read after write resource conflict when intervening write present |
Publications (2)
Publication Number | Publication Date |
---|---|
US20120042152A1 true US20120042152A1 (en) | 2012-02-16 |
US8499139B2 US8499139B2 (en) | 2013-07-30 |
Family
ID=45565632
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/855,201 Active 2031-10-20 US8499139B2 (en) | 2010-08-12 | 2010-08-12 | Avoiding stall in processor pipeline upon read after write resource conflict when intervening write present |
Country Status (1)
Country | Link |
---|---|
US (1) | US8499139B2 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140337393A1 (en) * | 2013-05-13 | 2014-11-13 | Amazon Technologies, Inc. | Transaction ordering |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6470445B1 (en) * | 1999-09-07 | 2002-10-22 | Hewlett-Packard Company | Preventing write-after-write data hazards by canceling earlier write when no intervening instruction uses value to be written by the earlier write |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5073855A (en) | 1989-06-30 | 1991-12-17 | Bull Hn Information Systems Inc. | Resource conflict detection method and apparatus included in a pipelined processing unit |
EP1004959B1 (en) | 1998-10-06 | 2018-08-08 | Texas Instruments Incorporated | Processor with pipeline protection |
-
2010
- 2010-08-12 US US12/855,201 patent/US8499139B2/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6470445B1 (en) * | 1999-09-07 | 2002-10-22 | Hewlett-Packard Company | Preventing write-after-write data hazards by canceling earlier write when no intervening instruction uses value to be written by the earlier write |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140337393A1 (en) * | 2013-05-13 | 2014-11-13 | Amazon Technologies, Inc. | Transaction ordering |
US9760596B2 (en) * | 2013-05-13 | 2017-09-12 | Amazon Technologies, Inc. | Transaction ordering |
US10872076B2 (en) | 2013-05-13 | 2020-12-22 | Amazon Technologies, Inc. | Transaction ordering |
Also Published As
Publication number | Publication date |
---|---|
US8499139B2 (en) | 2013-07-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11853763B2 (en) | Backward compatibility by restriction of hardware resources | |
US7577825B2 (en) | Method for data validity tracking to determine fast or slow mode processing at a reservation station | |
US6678807B2 (en) | System and method for multiple store buffer forwarding in a system with a restrictive memory model | |
JP2875909B2 (en) | Parallel processing unit | |
CN106406849B (en) | Method and system for providing backward compatibility, non-transitory computer readable medium | |
JP5416223B2 (en) | Memory model of hardware attributes in a transactional memory system | |
US8990786B2 (en) | Program optimizing apparatus, program optimizing method, and program optimizing article of manufacture | |
US8332597B1 (en) | Synchronization of external memory accesses in a dataflow machine | |
US6470445B1 (en) | Preventing write-after-write data hazards by canceling earlier write when no intervening instruction uses value to be written by the earlier write | |
JP7084379B2 (en) | Tracking stores and loads by bypassing loadstore units | |
JP5579694B2 (en) | Method and apparatus for managing a return stack | |
JP2015507254A (en) | Programs and computing devices with exceptions for code specialization in computer architectures that support transactions | |
CN110825437B (en) | Method and apparatus for processing data | |
US8464008B1 (en) | Command cancellation channel for read-modify-write operation in a memory | |
US9158545B2 (en) | Looking ahead bytecode stream to generate and update prediction information in branch target buffer for branching from the end of preceding bytecode handler to the beginning of current bytecode handler | |
US8499139B2 (en) | Avoiding stall in processor pipeline upon read after write resource conflict when intervening write present | |
US9081607B2 (en) | Conditional transaction abort and precise abort handling | |
US9507725B2 (en) | Store forwarding for data caches | |
CN111221573B (en) | Management method of register access time sequence, processor, electronic equipment and computer readable storage medium | |
US20230315471A1 (en) | Method and system for hardware-assisted pre-execution | |
US20160364240A1 (en) | Methods and apparatus to optimize instructions for execution by a processor | |
US6401195B1 (en) | Method and apparatus for replacing data in an operand latch of a pipeline stage in a processor during a stall | |
US20090031118A1 (en) | Apparatus and method for controlling order of instruction | |
WO2021037124A1 (en) | Task processing method and task processing device | |
CN117270972B (en) | Instruction processing method, device, equipment and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: LSI CORPORATION, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:DUBROVIN, LEONID;RABINOVITCH, ALEXANDER;MARGOLIN, HAGIT;AND OTHERS;SIGNING DATES FROM 20100810 TO 20100828;REEL/FRAME:025033/0271 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
AS | Assignment |
Owner name: DEUTSCHE BANK AG NEW YORK BRANCH, AS COLLATERAL AG Free format text: PATENT SECURITY AGREEMENT;ASSIGNORS:LSI CORPORATION;AGERE SYSTEMS LLC;REEL/FRAME:032856/0031 Effective date: 20140506 |
|
FEPP | Fee payment procedure |
Free format text: PAYER NUMBER DE-ASSIGNED (ORIGINAL EVENT CODE: RMPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
AS | Assignment |
Owner name: INTEL CORPORATION, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:LSI CORPORATION;REEL/FRAME:035090/0477 Effective date: 20141114 |
|
AS | Assignment |
Owner name: LSI CORPORATION, CALIFORNIA Free format text: TERMINATION AND RELEASE OF SECURITY INTEREST IN PATENTS AT REEL/FRAME NO. 32856/0031;ASSIGNOR:DEUTSCHE BANK AG NEW YORK BRANCH;REEL/FRAME:035797/0943 Effective date: 20150420 |
|
AS | Assignment |
Owner name: LSI CORPORATION, CALIFORNIA Free format text: TERMINATION AND RELEASE OF SECURITY INTEREST IN PATENT RIGHTS (RELEASES RF 032856-0031);ASSIGNOR:DEUTSCHE BANK AG NEW YORK BRANCH, AS COLLATERAL AGENT;REEL/FRAME:037684/0039 Effective date: 20160201 Owner name: AGERE SYSTEMS LLC, PENNSYLVANIA Free format text: TERMINATION AND RELEASE OF SECURITY INTEREST IN PATENT RIGHTS (RELEASES RF 032856-0031);ASSIGNOR:DEUTSCHE BANK AG NEW YORK BRANCH, AS COLLATERAL AGENT;REEL/FRAME:037684/0039 Effective date: 20160201 |
|
AS | Assignment |
Owner name: BEIJING XIAOMI MOBILE SOFTWARE CO., LTD., CHINA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:INTEL CORPORATION;REEL/FRAME:037733/0440 Effective date: 20160204 |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 8 |