[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

US20120042152A1 - Elimination of read-after-write resource conflicts in a pipeline of a processor - Google Patents

Elimination of read-after-write resource conflicts in a pipeline of a processor Download PDF

Info

Publication number
US20120042152A1
US20120042152A1 US12/855,201 US85520110A US2012042152A1 US 20120042152 A1 US20120042152 A1 US 20120042152A1 US 85520110 A US85520110 A US 85520110A US 2012042152 A1 US2012042152 A1 US 2012042152A1
Authority
US
United States
Prior art keywords
instruction
read
write
resource
pipeline
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US12/855,201
Other versions
US8499139B2 (en
Inventor
Leonid Dubrovin
Alexander Rabinovitch
Hagit Margolin
Noam Abda
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xiaomi Mobile Software Co Ltd
Original Assignee
LSI Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by LSI Corp filed Critical LSI Corp
Priority to US12/855,201 priority Critical patent/US8499139B2/en
Assigned to LSI CORPORATION reassignment LSI CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MARGOLIN, HAGIT, ABDA, NOAM, DUBROVIN, LEONID, RABINOVITCH, ALEXANDER
Publication of US20120042152A1 publication Critical patent/US20120042152A1/en
Application granted granted Critical
Publication of US8499139B2 publication Critical patent/US8499139B2/en
Assigned to DEUTSCHE BANK AG NEW YORK BRANCH, AS COLLATERAL AGENT reassignment DEUTSCHE BANK AG NEW YORK BRANCH, AS COLLATERAL AGENT PATENT SECURITY AGREEMENT Assignors: AGERE SYSTEMS LLC, LSI CORPORATION
Assigned to INTEL CORPORATION reassignment INTEL CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: LSI CORPORATION
Assigned to LSI CORPORATION reassignment LSI CORPORATION TERMINATION AND RELEASE OF SECURITY INTEREST IN PATENTS AT REEL/FRAME NO. 32856/0031 Assignors: DEUTSCHE BANK AG NEW YORK BRANCH
Assigned to LSI CORPORATION, AGERE SYSTEMS LLC reassignment LSI CORPORATION TERMINATION AND RELEASE OF SECURITY INTEREST IN PATENT RIGHTS (RELEASES RF 032856-0031) Assignors: DEUTSCHE BANK AG NEW YORK BRANCH, AS COLLATERAL AGENT
Assigned to BEIJING XIAOMI MOBILE SOFTWARE CO., LTD. reassignment BEIJING XIAOMI MOBILE SOFTWARE CO., LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: INTEL CORPORATION
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3838Dependency mechanisms, e.g. register scoreboarding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead

Definitions

  • the present invention relates to pipelined processors generally and, more particularly, to a method and/or apparatus for implementing an elimination of read-after-write resource conflicts in a pipeline of a processor.
  • a read-after-write conflict occurs in a pipelined processor when an instruction writes to a register or a status bit, and a succeeding instruction attempts to read from the register or status bit before the register or the status bit has actually been updated.
  • a typical read-after-write conflict occurs where an instruction I 2 attempts to read from a register R 1 before an instruction I 1 writes to the register R 1 :
  • the instruction I 2 When the instruction I 2 reaches stage S 3 and is ready to read the register R 1 , the instruction I 1 is only located at stage S 4 .
  • the register R 1 does not become available to read until a cycle after the instruction I 1 is executed and the data D 1 is written to the register R 1 . Therefore an interlock mechanism of the processor will stall the instruction I 2 for two cycles until the data D 1 is in the register R 1 and available to read.
  • FIG. 1 a diagram illustrating a conventional single cycle stall is shown.
  • the instructions in conflict are a write instruction I 1 (i.e., MOVE,L D 1 ,R 1 ) and a read instruction I 3 (i.e., MOVE,L (R 1 ),R 2 ).
  • An additional instruction I 2 i.e., ADD D 4 ,D 5 ) exists between the instructions I 1 and I 2 .
  • the instructions I 1 to I 3 progress through the early stages of the pipeline as normal in cycles 1 through 4 .
  • the instruction I 3 reaches stage S 3 and is ready to read the register R 1 and the instruction I 1 writes to the register R 1 from stage S 5 .
  • the interlock mechanism Since the register R 1 will not have the data D 1 calculated by the instruction I 1 stored within until a cycle after the write by the instruction I 1 , execution of the instruction I 3 is delayed until cycle 6 before reading from the register R 1 . Therefore, the interlock mechanism generates a single cycle stall of the instruction I 3 .
  • the number of cycles for which a read instruction is stalled depends on what stage the register is being read from, at what stage the register is being written to, and the distance between the two instructions that cause the conflict.
  • FIG. 2 a diagram illustrating a conventional three cycle stall is shown.
  • the instructions in conflict are a write instruction I 1 (i.e., CMPEQ D 0 ,D 1 ) to a bit T and a read instruction I 4 (i.e., IFT ADDA R 6 ,R 7 ) of the bit T.
  • Another write instruction I 3 i.e., CMPEQA R 0 ,R 1
  • An instruction I 2 i.e., IFT ADD D 6 ,D 7 ,D 8
  • I 1 and I 3 the instructions in conflict.
  • cycle 4 the instruction I 4 reaches stage S 4 and the instruction I 1 reaches stage S 7 . Since the instruction I 1 does not write until stage S 9 in cycle 6 , a three-cycle stall is generated by the interlock mechanism. The three-cycle stall delays the instruction I 4 from reading the bit T until cycle 7 . The interlock mechanism does not account for the instruction I 3 writing to the register R 1 from stage S 3 in cycle 2 . The interlock mechanism generates the three-cycle stall regardless of the presence or absence of the write instruction I 3 .
  • the present invention concerns an apparatus having a processor and a circuit.
  • the processor generally has a pipeline.
  • the circuit may be configured to (i) detect a first write instruction in the pipeline that writes to a resource, (ii) stall a read instruction in the pipeline where (a) a first read-after-write conflict exists between the first write instruction and the read instruction and (b) no other write instruction to the resource is scheduled between the first write instruction and the read instruction and (iii) not stall the read instruction due to the first read-after-write conflict where a second write instruction to the resource is scheduled between the first write instruction and the read instruction.
  • the objects, features and advantages of the present invention include providing a method and/or apparatus for implementing an elimination of read-after-write resource conflicts in a pipeline of a processor that may (i) take into account intermediate instructions between conflicting instructions, (ii) influence access to resources, (iii) influence an update stage of the resources, (iv) eliminate read-after-write conflicts in certain cases, (v) reduce a number of stalls in a pipelined processor and/or (vi) improve performance of a pipelined processor.
  • FIG. 1 is a diagram illustrating a conventional single cycle stall
  • FIG. 2 is a diagram illustrating a conventional three cycle stall
  • FIG. 3 is a block diagram of a device in accordance with a preferred embodiment of the present invention.
  • FIG. 4 is a flow diagram of an example method for conflict detection
  • FIG. 5 is a diagram illustrating an example flow of several instructions through several pipeline stages
  • FIG. 6 is a diagram of an example implementation of a device and a compiler
  • FIG. 7 is a diagram illustrating another example flow of several instructions through several pipeline stages.
  • An interlocked pipeline of a processor generally detects and solves resource conflicts.
  • Resource conflicts may happen in a processor when a resource (e.g., a register or a status bit) is read to and written from different pipeline stages.
  • a read-after-write conflict may occur when a particular stage writes to a resource and, an earlier stage attempts to read from the same resource in accordance with a following instruction or instructions.
  • a distance e.g., number of instructions
  • a distance e.g., number of stages or cycles
  • Some embodiments of the present invention may omit an anticipated read-after-write conflict when an additional write to the resource occurs between the read after the write. Therefore, the number of inserted stalls may be reduced. If the additional write does not cause another read-after-write conflict, the stalls may be eliminated.
  • the device (or apparatus) 100 may implement a pipelined processor.
  • the device 100 may implement an interlocked pipelined processor.
  • the device 100 generally comprises a circuit (or module) 102 , a circuit (or module) 104 , a circuit (or module) 106 and a circuit (or module) 108 .
  • the circuits 102 to 108 may represent modules and/or blocks that may be implemented as hardware, firmware, software, a combination of hardware, firmware and/or software, or other implementations.
  • a signal (e.g., INSTRa) may be generated by the circuit 102 and presented to both the circuit 104 and the circuit 106 .
  • the circuit 102 may also generate a signal (e.g., INSTRb) that is received by the circuit 104 .
  • a signal (e.g., STALL) may be created by the circuit 104 and transferred to the circuit 106 .
  • the circuit 106 and the circuit 108 may exchange a signal (e.g., DATA).
  • the circuit 102 may implement an instruction memory circuit.
  • the circuit 102 is generally operational to store instructions that are executed by the device 100 .
  • a sequence of instructions may be presented to the circuits 104 and 106 in the signal INSTRa. Some instructions may be presented in the signal INSTRb to the circuit 104 .
  • the circuit 104 generally implements a conflict detector circuit.
  • the circuit 104 is generally operational to search for and detect resource conflicts among the instructions currently being executed and/or about to be executed. As instructions are loaded from the circuit 102 into the circuit 106 , a copy of each current instruction may be examined by the circuit 104 . Each current instruction may be compared against one or more other instructions received in the signal INSTRb. The other instructions are generally scheduled to follow the current instruction.
  • the circuit 104 may be operational to detect read-after-write conflicts among the instructions. In certain cases, the read-after-write conflicts may be anticipated and later canceled as an intervening write may negate the conflict. For each conflict detected, the circuit 104 may generate corresponding information in the signal STALL. The information generally informs the circuit 106 how to overcome the conflict. For example, a read-after-write conflict may be overcome by stalling the read instruction until the written data has been updated and is ready to be read.
  • the circuit 106 may implement a pipelined processor circuit having multiple stages.
  • the circuit 106 is generally operational to process multiple instructions simultaneously, generally a different instruction in each of the stages.
  • the instructions may be received from the circuit 102 through the signal INSTRa.
  • Data generated and/or read by the executing instructions may be exchanged with the circuit 108 via the signal DATA.
  • the circuit 106 may be operational to deal with resource conflicts based on the information received in the signal STALL. For example, the circuit 106 may stall a resource read operation in a particular stage for one or more cycles. The stalls generally allow a resource write operation sufficient time to write the data into the resource from a different pipeline stage of the circuit 106 .
  • the circuit 108 may implement a data memory circuit.
  • the circuit 108 is generally operational to buffer data for the circuit 106 via the signal DATA.
  • the circuit 108 may be divided into individual bits, registers, blocks and/or pages. Data written to the circuit 108 from the circuit 106 may be available to the circuit 106 a clock cycle after being written. In some designs, part to all of the circuit 108 may be integrated into the circuit 106 .
  • the method (or process) 120 generally comprises a step (or block) 122 , a step (or block) 124 , a step (or block) 126 , a step (or block) 128 , a step (or block) 130 , a step (or block) 132 , a step (or block) 134 , a step (or block) 136 and a step (or block) 138 .
  • the method 120 may be implemented by the circuit 104 .
  • the steps 122 to 138 may represent modules and/or blocks that may be implemented as hardware, firmware, software, a combination of hardware, firmware and/or software, or other implementations.
  • the method 120 generally starts at the step 122 waiting for an instruction.
  • the circuit 104 may detect a current write instruction in the signal INSTRa.
  • the current write instruction generally writes to a resource (e.g., resource X) at a particular stage (e.g., stage N) of the circuit 106 .
  • Stage N may be any stage in the pipeline.
  • the current write instruction may be in a sequence of instructions stored in the circuit 102 .
  • the current write instruction may or may not be the beginning of a read-after-write conflict, depending on the subsequently-scheduled instructions in the sequence.
  • the circuit 104 may look at the next instruction in the sequence in the step 128 .
  • the next instruction may be obtained from the circuit 102 via the signal INSTRb.
  • a check may be performed in the step 130 to see if the next instruction is another write instruction. If the next instruction is (i) not a write to the resource X and/or (ii) not at a stage earlier than stage G (e.g., the NO block of step 130 ), the method 120 may proceed to step 132 .
  • step 132 if the next instruction is all of (i) a read instruction (ii) from resource X at a stage (e.g., stage M) and (iii) stage M is earlier in the pipeline than stage G (e.g., the YES branch of step 132 ), the circuit 104 generally concludes that a read-after-write conflict exists between the current write instruction and the read instruction.
  • stage M may be any stage in the pipeline.
  • the circuit 104 may assert an interlock in step 134 for a total of G-M cycles.
  • the circuit 104 may convey the interlock assertion to the circuit 106 in the signal STALL.
  • the signal STALL generally commands the circuit 106 to stall the read instruction at the stage M for a calculated number of cycles.
  • the stall may delay the read instruction long enough to allow the current write instruction to load (write, transfer, place) the data into the circuit 108 from stage N and the data to become available to read.
  • a check may be performed in the step 138 to determine if more instructions should be examined or not. If the variable G has not reached zero (e.g., the NO branch of step 138 ), more instructions should be examined.
  • the circuit 104 may obtain the next scheduled instruction from the circuit 102 via the signal. INSTRb. Thereafter, the method 120 may continue with the step 130 .
  • the circuit 104 may conclude that no read-after-write conflict exists for the current write instruction at the stage N.
  • the method 120 thus returns to the step 122 and waits for another instruction in the signal INSTRa.
  • the circuit 104 may conclude that the current write instruction does not cause a read-after-write conflict. Therefore, the method 120 may return to the step 122 and wait for another instruction in the signal INSTRa. When the subsequent write instruction from step 130 appears in the signal INSTRa, the circuit 104 may being another a read-after-write conflict test based on the subsequent write instruction.
  • Circuit 104 may run the method 120 independently for every executed instruction.
  • the method 120 generally accounts for one or more write instructions between the current write instruction and a later read instruction. Consequently, a potential read-after-write conflict between the current write instruction and the later read instruction may be cancelled by an intermediate (e.g., subsequent) write instruction. Cancellation of the potential read-after-write resource conflict generally reduces the number of stalls performed by the circuit 106 . Therefore, a performance of the processor 106 may be increased. In some cases, however, a new read-after-write conflict may be detected between the intermediate write instruction and the later read instruction.
  • the method 120 may account for nested read-after-write conflicts.
  • a read-after-write conflict for the resource X may have nested within a read-after-write conflict for another resource (e.g., resource Y).
  • a write to the resource Y may take place after the write to the resource X.
  • a read from the resource Y may occur before the read from the resource X.
  • the NO branch may be selected as the write is not to the resource X.
  • the NO branch may be selected as the read is not from the resource X. Therefore, the read-after-write conflict for resource Y may have no impact of the read-after-write conflict for resource X.
  • the method 120 may produce the same results as shown in FIG. 1 where a true read-after-write resource conflict exists.
  • the instruction I 1 writes to a resource X from stage S 5 and the instruction I 3 reads the resource X from earlier stage S 3 .
  • the instruction I 2 may be obtained by the circuit 104 in step 128 . Steps 130 and 132 may determine that instructions I 1 and I 2 do not cause a read-after-write conflict.
  • the counter variable G may be decremented from 5 to 4 in step 136 and instruction I 3 obtained in step 128 .
  • the circuit 104 may determine in the step 132 that the instructions I 1 and I 3 cause a read-after-write conflict.
  • the instruction may write to the resource X from stage S 5 and the circuit 104 may generate a stall in step 134 that prevents instruction I 3 from reading the resource X until cycle 6 .
  • FIG. 5 a diagram illustrating an example flow of several instructions through several pipeline stages of the circuit 106 is shown.
  • the example generally illustrates the conditions shown in FIG. 2 where the instruction I 1 writes to a resource X from stage S 9 , the instruction I 2 reads the resource X at stage S 9 , the instruction I 3 writes to same resource X from stage S 3 and the instruction I 4 reads the resource X at stage S 4 .
  • the counter variable G may be decremented from 9 to 8 the step 136 and the instruction I 3 may be obtained from the circuit 102 in the step 128 .
  • the circuit 104 may detect that instruction I 3 is a write to the resource X at stage S 3 , which is before stage S 9 . Therefore, the circuit 104 may conclude in step 130 that instruction I 1 is not involved in any read-after-write conflicts for the resource X even though instruction I 4 would normally cause such a conflict.
  • the device 140 may implement a pipelined processor. In some embodiments, the device 140 may implement an exposed pipelined processor.
  • the device 140 generally comprises the circuit 102 , the circuit 106 and the circuit 108 .
  • a circuit (or module) 142 may be in communication with the device 140 through a signal (e.g., INSTRc).
  • the circuit 142 may represent a module and/or block that may be implemented as hardware, firmware, software, a combination of hardware, firmware and/or software, or other implementations.
  • the circuit 142 may implement a compiler circuit with a conflict detection capability.
  • the circuit 142 is generally operational to perform the method 120 and adjust the sequence of instructions accordingly to resolve conflicts before the instructions are stored in the circuit 102 .
  • the conflicts may be read-before-write resource conflicts.
  • the conflicts may be resolved by inserting no-operation (NOP) instructions into the instruction schedule in step 134 , rather than stalling the read instructions in the circuit 106 .
  • NOP no-operation
  • FIG. 7 a diagram illustrating another example flow of several instructions through several pipeline stages of the circuit 106 is shown.
  • the example generally illustrates the same conditions as in FIG. 1 , where the instruction I 1 writes to a resource X from stage S 5 , and instruction I 3 reads from the resource X at stage S 3 .
  • the instruction I 2 may be subsequently examined by the circuit 142 .
  • Steps 130 and 132 generally determine that instruction I 2 is not involved in a read-before-write resource conflict with instruction I 1 . Therefore, step 136 may reduce the counter variable G from 5 to 4.
  • Instruction I 3 may be subsequently examined by the circuit 142 .
  • Step 132 may determine that a read-after-write conflict for resource X exists between instructions I 1 and I 3 .
  • Step 134 may calculate that a single cycle delay of instruction I 3 may resolve the conflict.
  • the circuit 142 may insert a NOP instruction between the instruction I 2 and 13 in the step 134 .
  • NOP instruction As illustrated in FIG. 7 , when instruction I 1 writes to the resource X from stage S 5 in cycle 6 the instruction I 3 is still in stage S 2 . When instruction I 3 reaches stage S 3 in cycle 6 , the data written by instruction I 1 is ready and available without any further delay.
  • the circuit 142 may respond to the instructions I 1 to I 4 similar to the response of the circuit 104 .
  • the circuit 142 may determine that (i) no read-after-write conflict exists between the instructions I 1 and I 2 , (ii) instruction I 3 negates any further read-after-write conflicts for the instruction I 1 and (iii) no read-after-write conflict exists between the instructions I 3 and I 4 . Therefore, the circuit 142 does not insert any NOP instructions in the sequence of instructions I 1 to I 4 .
  • FIGS. 3 , 4 and 6 may be implemented using one or more of a conventional general purpose processor, digital computer, microprocessor, microcontroller, RISC (reduced instruction set computer) processor, CISC (complex instruction set computer) processor, SIMD (single instruction multiple data) processor, signal processor, central processing unit (CPU), arithmetic logic unit (ALU), video digital signal processor (VDSP) and/or similar computational machines, programmed according to the teachings of the present specification, as will be apparent to those skilled in the relevant art(s).
  • RISC reduced instruction set computer
  • CISC complex instruction set computer
  • SIMD single instruction multiple data
  • signal processor central processing unit
  • CPU central processing unit
  • ALU arithmetic logic unit
  • VDSP video digital signal processor
  • the present invention may also be implemented by the preparation of ASICs (application specific integrated circuits), Platform ASICs, FPGAs (field programmable gate arrays), PLDs (programmable logic devices), CPLDs (complex programmable logic device), sea-of-gates, RFICs (radio frequency integrated circuits), ASSPs (application specific standard products) or by interconnecting an appropriate network of conventional component circuits, as is described herein, modifications of which will be readily apparent to those skilled in the art(s).
  • ASICs application specific integrated circuits
  • FPGAs field programmable gate arrays
  • PLDs programmable logic devices
  • CPLDs complex programmable logic device
  • sea-of-gates RFICs (radio frequency integrated circuits)
  • ASSPs application specific standard products
  • the present invention thus may also include a computer product which may be a storage medium or media and/or a transmission medium or media including instructions which may be used to program a machine to perform one or more processes or methods in accordance with the present invention.
  • a computer product which may be a storage medium or media and/or a transmission medium or media including instructions which may be used to program a machine to perform one or more processes or methods in accordance with the present invention.
  • Execution of instructions contained in the computer product by the machine, along with operations of surrounding circuitry may transform input data into one or more files on the storage medium and/or one or more output signals representative of a physical object or substance, such as an audio and/or visual depiction.
  • the storage medium may include, but is not limited to, any type of disk including floppy disk, hard drive, magnetic disk, optical disk, CD-ROM, DVD and magneto-optical disks and circuits such as ROMs (read-only memories), RAMS (random access memories), EPROMs (electronically programmable ROMs), EEPROMs (electronically erasable ROMs), UVPROM (ultra-violet erasable ROMs), Flash memory, magnetic cards, optical cards, and/or any type of media suitable for storing electronic instructions.
  • ROMs read-only memories
  • RAMS random access memories
  • EPROMs electroly programmable ROMs
  • EEPROMs electro-erasable ROMs
  • UVPROM ultra-violet erasable ROMs
  • Flash memory magnetic cards, optical cards, and/or any type of media suitable for storing electronic instructions.
  • the elements of the invention may form part or all of one or more devices, units, components, systems, machines and/or apparatuses.
  • the devices may include, but are not limited to, servers, workstations, storage array controllers, storage systems, personal computers, laptop computers, notebook computers, palm computers, personal digital assistants, portable electronic devices, battery powered devices, set-top boxes, encoders, decoders, transcoders, compressors, decompressors, pre-processors, post-processors, transmitters, receivers, transceivers, cipher circuits, cellular telephones, digital cameras, positioning and/or navigation systems, medical equipment, heads-up displays, wireless devices, audio recording, storage and/or playback devices, video recording, storage and/or playback devices, game platforms, peripherals and/or multi-chip modules.
  • the elements of the invention may be implemented in other types of devices to meet the criteria of a particular application.
  • the term “simultaneously” is meant to describe events that share some common time period but the term is not meant to be limited to events that begin at the same point in time, end at the same point in time, or have the same duration.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)

Abstract

An apparatus having a processor and a circuit is disclosed. The processor generally has a pipeline. The circuit may be configured to (i) detect a first write instruction in the pipeline that writes to a resource, (ii) stall a read instruction in the pipeline where (a) a first read-after-write conflict exists between the first write instruction and the read instruction and (b) no other write instruction to the resource is scheduled between the first write instruction and the read instruction and (iii) not stall the read instruction due to the first read-after-write conflict where a second write instruction to the resource is scheduled between the first write instruction and the read instruction.

Description

    FIELD OF THE INVENTION
  • The present invention relates to pipelined processors generally and, more particularly, to a method and/or apparatus for implementing an elimination of read-after-write resource conflicts in a pipeline of a processor.
  • BACKGROUND OF THE INVENTION
  • A read-after-write conflict occurs in a pipelined processor when an instruction writes to a register or a status bit, and a succeeding instruction attempts to read from the register or status bit before the register or the status bit has actually been updated. Consider the following examples based on a pipelined with 10 stages and the StarCore assembler language. A typical read-after-write conflict occurs where an instruction I2 attempts to read from a register R1 before an instruction I1 writes to the register R1:
  • I1: MOVE, L D 1, R1; Write R1 at stage S5
  • I2: MOVE, L (R1), R2; Read R1 at stage S3
  • When the instruction I2 reaches stage S3 and is ready to read the register R1, the instruction I1 is only located at stage S4. The register R1 does not become available to read until a cycle after the instruction I1 is executed and the data D1 is written to the register R1. Therefore an interlock mechanism of the processor will stall the instruction I2 for two cycles until the data D1 is in the register R1 and available to read.
  • Referring to FIG. 1, a diagram illustrating a conventional single cycle stall is shown. In the example, the instructions in conflict are a write instruction I1 (i.e., MOVE,L D 1,R1) and a read instruction I3 (i.e., MOVE,L (R1),R2). An additional instruction I2 (i.e., ADD D4,D5) exists between the instructions I1 and I2. The instructions I1 to I3 progress through the early stages of the pipeline as normal in cycles 1 through 4. In cycle 5, the instruction I3 reaches stage S3 and is ready to read the register R1 and the instruction I1 writes to the register R1 from stage S5. Since the register R1 will not have the data D1 calculated by the instruction I1 stored within until a cycle after the write by the instruction I1, execution of the instruction I3 is delayed until cycle 6 before reading from the register R1. Therefore, the interlock mechanism generates a single cycle stall of the instruction I3. The number of cycles for which a read instruction is stalled depends on what stage the register is being read from, at what stage the register is being written to, and the distance between the two instructions that cause the conflict.
  • Referring to FIG. 2, a diagram illustrating a conventional three cycle stall is shown. In the example, the instructions in conflict are a write instruction I1 (i.e., CMPEQ D0,D1) to a bit T and a read instruction I4 (i.e., IFT ADDA R6,R7) of the bit T. Another write instruction I3 (i.e., CMPEQA R0,R1) that writes to the bit T resides between the instructions I1 and I4. An instruction I2 (i.e., IFT ADD D6,D7,D8) that reads the bit T is located between the instructions I1 and I3.
  • In cycle 4, the instruction I4 reaches stage S4 and the instruction I1 reaches stage S7. Since the instruction I1 does not write until stage S9 in cycle 6, a three-cycle stall is generated by the interlock mechanism. The three-cycle stall delays the instruction I4 from reading the bit T until cycle 7. The interlock mechanism does not account for the instruction I3 writing to the register R1 from stage S3 in cycle 2. The interlock mechanism generates the three-cycle stall regardless of the presence or absence of the write instruction I3.
  • SUMMARY OF THE INVENTION
  • The present invention concerns an apparatus having a processor and a circuit. The processor generally has a pipeline.
  • The circuit may be configured to (i) detect a first write instruction in the pipeline that writes to a resource, (ii) stall a read instruction in the pipeline where (a) a first read-after-write conflict exists between the first write instruction and the read instruction and (b) no other write instruction to the resource is scheduled between the first write instruction and the read instruction and (iii) not stall the read instruction due to the first read-after-write conflict where a second write instruction to the resource is scheduled between the first write instruction and the read instruction.
  • The objects, features and advantages of the present invention include providing a method and/or apparatus for implementing an elimination of read-after-write resource conflicts in a pipeline of a processor that may (i) take into account intermediate instructions between conflicting instructions, (ii) influence access to resources, (iii) influence an update stage of the resources, (iv) eliminate read-after-write conflicts in certain cases, (v) reduce a number of stalls in a pipelined processor and/or (vi) improve performance of a pipelined processor.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • These and other objects, features and advantages of the present invention will be apparent from the following detailed description and the appended claims and drawings in which:
  • FIG. 1 is a diagram illustrating a conventional single cycle stall;
  • FIG. 2 is a diagram illustrating a conventional three cycle stall;
  • FIG. 3 is a block diagram of a device in accordance with a preferred embodiment of the present invention;
  • FIG. 4 is a flow diagram of an example method for conflict detection;
  • FIG. 5 is a diagram illustrating an example flow of several instructions through several pipeline stages;
  • FIG. 6 is a diagram of an example implementation of a device and a compiler; and
  • FIG. 7 is a diagram illustrating another example flow of several instructions through several pipeline stages.
  • DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
  • An interlocked pipeline of a processor generally detects and solves resource conflicts. Resource conflicts may happen in a processor when a resource (e.g., a register or a status bit) is read to and written from different pipeline stages. A read-after-write conflict may occur when a particular stage writes to a resource and, an earlier stage attempts to read from the same resource in accordance with a following instruction or instructions. In general, (i) a distance (e.g., number of instructions) between the read instruction and the write instruction and (ii) a distance (e.g., number of stages or cycles) between the writing stage and the reading stage in the pipeline generally define a number of interlock (e.g., stall) cycles that resolve the read-after-write conflict. Some embodiments of the present invention may omit an anticipated read-after-write conflict when an additional write to the resource occurs between the read after the write. Therefore, the number of inserted stalls may be reduced. If the additional write does not cause another read-after-write conflict, the stalls may be eliminated.
  • Referring to FIG. 3, a block diagram of a device 100 is shown in accordance with a preferred embodiment of the present invention. The device (or apparatus) 100 may implement a pipelined processor. In some embodiments, the device 100 may implement an interlocked pipelined processor. The device 100 generally comprises a circuit (or module) 102, a circuit (or module) 104, a circuit (or module) 106 and a circuit (or module) 108. The circuits 102 to 108 may represent modules and/or blocks that may be implemented as hardware, firmware, software, a combination of hardware, firmware and/or software, or other implementations.
  • A signal (e.g., INSTRa) may be generated by the circuit 102 and presented to both the circuit 104 and the circuit 106. The circuit 102 may also generate a signal (e.g., INSTRb) that is received by the circuit 104. A signal (e.g., STALL) may be created by the circuit 104 and transferred to the circuit 106. The circuit 106 and the circuit 108 may exchange a signal (e.g., DATA).
  • The circuit 102 may implement an instruction memory circuit. The circuit 102 is generally operational to store instructions that are executed by the device 100. A sequence of instructions may be presented to the circuits 104 and 106 in the signal INSTRa. Some instructions may be presented in the signal INSTRb to the circuit 104.
  • The circuit 104 generally implements a conflict detector circuit. The circuit 104 is generally operational to search for and detect resource conflicts among the instructions currently being executed and/or about to be executed. As instructions are loaded from the circuit 102 into the circuit 106, a copy of each current instruction may be examined by the circuit 104. Each current instruction may be compared against one or more other instructions received in the signal INSTRb. The other instructions are generally scheduled to follow the current instruction. In some embodiments, the circuit 104 may be operational to detect read-after-write conflicts among the instructions. In certain cases, the read-after-write conflicts may be anticipated and later canceled as an intervening write may negate the conflict. For each conflict detected, the circuit 104 may generate corresponding information in the signal STALL. The information generally informs the circuit 106 how to overcome the conflict. For example, a read-after-write conflict may be overcome by stalling the read instruction until the written data has been updated and is ready to be read.
  • The circuit 106 may implement a pipelined processor circuit having multiple stages. The circuit 106 is generally operational to process multiple instructions simultaneously, generally a different instruction in each of the stages. The instructions may be received from the circuit 102 through the signal INSTRa. Data generated and/or read by the executing instructions may be exchanged with the circuit 108 via the signal DATA. The circuit 106 may be operational to deal with resource conflicts based on the information received in the signal STALL. For example, the circuit 106 may stall a resource read operation in a particular stage for one or more cycles. The stalls generally allow a resource write operation sufficient time to write the data into the resource from a different pipeline stage of the circuit 106.
  • The circuit 108 may implement a data memory circuit. The circuit 108 is generally operational to buffer data for the circuit 106 via the signal DATA. The circuit 108 may be divided into individual bits, registers, blocks and/or pages. Data written to the circuit 108 from the circuit 106 may be available to the circuit 106 a clock cycle after being written. In some designs, part to all of the circuit 108 may be integrated into the circuit 106.
  • Referring to FIG. 4, a flow diagram of an example method 120 for conflict detection is shown. The method (or process) 120 generally comprises a step (or block) 122, a step (or block) 124, a step (or block) 126, a step (or block) 128, a step (or block) 130, a step (or block) 132, a step (or block) 134, a step (or block) 136 and a step (or block) 138. The method 120 may be implemented by the circuit 104. The steps 122 to 138 may represent modules and/or blocks that may be implemented as hardware, firmware, software, a combination of hardware, firmware and/or software, or other implementations.
  • The method 120 generally starts at the step 122 waiting for an instruction. In the step 124, the circuit 104 may detect a current write instruction in the signal INSTRa. The current write instruction generally writes to a resource (e.g., resource X) at a particular stage (e.g., stage N) of the circuit 106. Stage N may be any stage in the pipeline. The current write instruction may be in a sequence of instructions stored in the circuit 102. Upon detection of the current write instruction, the circuit 104 may set a counter variable (e.g., G) in the step 126 to match the stage number N from which the write takes place (e.g., G=N). The current write instruction may or may not be the beginning of a read-after-write conflict, depending on the subsequently-scheduled instructions in the sequence.
  • The circuit 104 may look at the next instruction in the sequence in the step 128. The next instruction may be obtained from the circuit 102 via the signal INSTRb. A check may be performed in the step 130 to see if the next instruction is another write instruction. If the next instruction is (i) not a write to the resource X and/or (ii) not at a stage earlier than stage G (e.g., the NO block of step 130), the method 120 may proceed to step 132. In the step 132, if the next instruction is all of (i) a read instruction (ii) from resource X at a stage (e.g., stage M) and (iii) stage M is earlier in the pipeline than stage G (e.g., the YES branch of step 132), the circuit 104 generally concludes that a read-after-write conflict exists between the current write instruction and the read instruction. The stage M may be any stage in the pipeline.
  • In response to detecting the read-after-write conflict, the circuit 104 may assert an interlock in step 134 for a total of G-M cycles. The circuit 104 may convey the interlock assertion to the circuit 106 in the signal STALL. The signal STALL generally commands the circuit 106 to stall the read instruction at the stage M for a calculated number of cycles. The stall may delay the read instruction long enough to allow the current write instruction to load (write, transfer, place) the data into the circuit 108 from stage N and the data to become available to read.
  • If the next instruction is (i) not a read instruction and/or (ii) not a read from stage M (e.g., the NO branch of step 132), the circuit 104 may decrement the counter variable G (e.g., G=G−1) in the step 136. A check may be performed in the step 138 to determine if more instructions should be examined or not. If the variable G has not reached zero (e.g., the NO branch of step 138), more instructions should be examined. In the step 128, the circuit 104 may obtain the next scheduled instruction from the circuit 102 via the signal. INSTRb. Thereafter, the method 120 may continue with the step 130. If the counter variable G has reached zero (e.g., the YES branch in the step 138), the circuit 104 may conclude that no read-after-write conflict exists for the current write instruction at the stage N. The method 120 thus returns to the step 122 and waits for another instruction in the signal INSTRa.
  • Returning to the step 130, if the next instruction is (i) a subsequent write instruction to the resource X and (ii) occurs at an earlier stage than stage G (e.g., the YES branch of step 130), the circuit 104 may conclude that the current write instruction does not cause a read-after-write conflict. Therefore, the method 120 may return to the step 122 and wait for another instruction in the signal INSTRa. When the subsequent write instruction from step 130 appears in the signal INSTRa, the circuit 104 may being another a read-after-write conflict test based on the subsequent write instruction.
  • Circuit 104 may run the method 120 independently for every executed instruction. The method 120 generally accounts for one or more write instructions between the current write instruction and a later read instruction. Consequently, a potential read-after-write conflict between the current write instruction and the later read instruction may be cancelled by an intermediate (e.g., subsequent) write instruction. Cancellation of the potential read-after-write resource conflict generally reduces the number of stalls performed by the circuit 106. Therefore, a performance of the processor 106 may be increased. In some cases, however, a new read-after-write conflict may be detected between the intermediate write instruction and the later read instruction.
  • By considering each executed instruction independently, the method 120 may account for nested read-after-write conflicts. For example, a read-after-write conflict for the resource X may have nested within a read-after-write conflict for another resource (e.g., resource Y). In particular, a write to the resource Y may take place after the write to the resource X. A read from the resource Y may occur before the read from the resource X. When the write to resource Y is considered at the step 130, the NO branch may be selected as the write is not to the resource X. When the read from resource Y is considered at the step 132, the NO branch may be selected as the read is not from the resource X. Therefore, the read-after-write conflict for resource Y may have no impact of the read-after-write conflict for resource X.
  • The method 120 may produce the same results as shown in FIG. 1 where a true read-after-write resource conflict exists. By way of example, consider the situation of FIG. 1 where the instruction I1 writes to a resource X from stage S5 and the instruction I3 reads the resource X from earlier stage S3. As the instruction I1 is transferred from the circuit 102 to the circuit 106, the circuit 104 may detect the write to resource X from stage S5 per step 124 (G=N=5). The instruction I2 may be obtained by the circuit 104 in step 128. Steps 130 and 132 may determine that instructions I1 and I2 do not cause a read-after-write conflict.
  • The counter variable G may be decremented from 5 to 4 in step 136 and instruction I3 obtained in step 128. The circuit 104 may determine in the step 132 that the instructions I1 and I3 cause a read-after-write conflict. In cycle 5, the instruction may write to the resource X from stage S5 and the circuit 104 may generate a stall in step 134 that prevents instruction I3 from reading the resource X until cycle 6.
  • Referring to FIG. 5, a diagram illustrating an example flow of several instructions through several pipeline stages of the circuit 106 is shown. The example generally illustrates the conditions shown in FIG. 2 where the instruction I1 writes to a resource X from stage S9, the instruction I2 reads the resource X at stage S9, the instruction I3 writes to same resource X from stage S3 and the instruction I4 reads the resource X at stage S4.
  • When the instruction I1 is copied into the pipeline in the circuit 106, the circuit 104 may detect the write to resource X at the stage S9 per step 124 (G=N=9). The circuit 104 may obtain a copy of the instruction I2 in the step 128. Steps 130 and 132 generally determine that instructions I1 and I2 do not cause a read-after-write conflict because instruction I2 reads from the resource X in cycle 7, which is after instruction I1 writes to the resource in cycle 6. The counter variable G may be decremented from 9 to 8 the step 136 and the instruction I3 may be obtained from the circuit 102 in the step 128. The circuit 104 may detect that instruction I3 is a write to the resource X at stage S3, which is before stage S9. Therefore, the circuit 104 may conclude in step 130 that instruction I1 is not involved in any read-after-write conflicts for the resource X even though instruction I4 would normally cause such a conflict.
  • When the instruction I3 is transferred to the circuit 106, the circuit 104 may detect the write to the resource X at the stage S3 per step 124 (G=N=3). Steps 130 and 132 may determine that the instruction I4 does not cause a read-after-write conflict with the instruction I3 because the instruction I4 reads the resource X in cycle 4, which is well after instruction I3 writes to the resource X in cycle 2. Due to the operations of the circuit 104, the circuit 106 may not be stalled for the three cycles as shown in FIG. 2, resulting in better processing efficiency.
  • Referring to FIG. 6, a diagram of an example implementation of a device 140 is shown. The device (or apparatus) 140 may implement a pipelined processor. In some embodiments, the device 140 may implement an exposed pipelined processor. The device 140 generally comprises the circuit 102, the circuit 106 and the circuit 108. A circuit (or module) 142 may be in communication with the device 140 through a signal (e.g., INSTRc). The circuit 142 may represent a module and/or block that may be implemented as hardware, firmware, software, a combination of hardware, firmware and/or software, or other implementations.
  • The circuit 142 may implement a compiler circuit with a conflict detection capability. The circuit 142 is generally operational to perform the method 120 and adjust the sequence of instructions accordingly to resolve conflicts before the instructions are stored in the circuit 102. In some embodiments, the conflicts may be read-before-write resource conflicts. The conflicts may be resolved by inserting no-operation (NOP) instructions into the instruction schedule in step 134, rather than stalling the read instructions in the circuit 106.
  • Referring to FIG. 7, a diagram illustrating another example flow of several instructions through several pipeline stages of the circuit 106 is shown. The example generally illustrates the same conditions as in FIG. 1, where the instruction I1 writes to a resource X from stage S5, and instruction I3 reads from the resource X at stage S3.
  • When instruction I1 is detected by the circuit 142 per step 124, the circuit 142 may detect a write to the resource X from the stage S5 (G=N=5). The instruction I2 may be subsequently examined by the circuit 142. Steps 130 and 132 generally determine that instruction I2 is not involved in a read-before-write resource conflict with instruction I1. Therefore, step 136 may reduce the counter variable G from 5 to 4. Instruction I3 may be subsequently examined by the circuit 142. Step 132 may determine that a read-after-write conflict for resource X exists between instructions I1 and I3. Step 134 may calculate that a single cycle delay of instruction I3 may resolve the conflict. Therefore, the circuit 142 may insert a NOP instruction between the instruction I2 and 13 in the step 134. As illustrated in FIG. 7, when instruction I1 writes to the resource X from stage S5 in cycle 6 the instruction I3 is still in stage S2. When instruction I3 reaches stage S3 in cycle 6, the data written by instruction I1 is ready and available without any further delay.
  • Where the circuit 142 encounters the situation of FIG. 2, the results may be the same as illustrated in FIG. 5. The situation of FIG. 2 has instruction I1 writing to the resource X from stage S9, instruction I2 reading the resource X from stage S9, instruction I3 writing to the resource X from stage 53 and instruction I4 reading the resource X from stage 54. The circuit 142 may respond to the instructions I1 to I4 similar to the response of the circuit 104. By following the method 120, the circuit 142 may determine that (i) no read-after-write conflict exists between the instructions I1 and I2, (ii) instruction I3 negates any further read-after-write conflicts for the instruction I1 and (iii) no read-after-write conflict exists between the instructions I3 and I4. Therefore, the circuit 142 does not insert any NOP instructions in the sequence of instructions I1 to I4.
  • The functions performed by the diagrams of FIGS. 3, 4 and 6 may be implemented using one or more of a conventional general purpose processor, digital computer, microprocessor, microcontroller, RISC (reduced instruction set computer) processor, CISC (complex instruction set computer) processor, SIMD (single instruction multiple data) processor, signal processor, central processing unit (CPU), arithmetic logic unit (ALU), video digital signal processor (VDSP) and/or similar computational machines, programmed according to the teachings of the present specification, as will be apparent to those skilled in the relevant art(s). Appropriate software, firmware, coding, routines, instructions, opcodes, microcode, and/or program modules may readily be prepared by skilled programmers based on the teachings of the present disclosure, as will also be apparent to those skilled in the relevant art(s). The software is generally executed from a medium or several media by one or more of the processors of the machine implementation.
  • The present invention may also be implemented by the preparation of ASICs (application specific integrated circuits), Platform ASICs, FPGAs (field programmable gate arrays), PLDs (programmable logic devices), CPLDs (complex programmable logic device), sea-of-gates, RFICs (radio frequency integrated circuits), ASSPs (application specific standard products) or by interconnecting an appropriate network of conventional component circuits, as is described herein, modifications of which will be readily apparent to those skilled in the art(s).
  • The present invention thus may also include a computer product which may be a storage medium or media and/or a transmission medium or media including instructions which may be used to program a machine to perform one or more processes or methods in accordance with the present invention. Execution of instructions contained in the computer product by the machine, along with operations of surrounding circuitry, may transform input data into one or more files on the storage medium and/or one or more output signals representative of a physical object or substance, such as an audio and/or visual depiction. The storage medium may include, but is not limited to, any type of disk including floppy disk, hard drive, magnetic disk, optical disk, CD-ROM, DVD and magneto-optical disks and circuits such as ROMs (read-only memories), RAMS (random access memories), EPROMs (electronically programmable ROMs), EEPROMs (electronically erasable ROMs), UVPROM (ultra-violet erasable ROMs), Flash memory, magnetic cards, optical cards, and/or any type of media suitable for storing electronic instructions.
  • The elements of the invention may form part or all of one or more devices, units, components, systems, machines and/or apparatuses. The devices may include, but are not limited to, servers, workstations, storage array controllers, storage systems, personal computers, laptop computers, notebook computers, palm computers, personal digital assistants, portable electronic devices, battery powered devices, set-top boxes, encoders, decoders, transcoders, compressors, decompressors, pre-processors, post-processors, transmitters, receivers, transceivers, cipher circuits, cellular telephones, digital cameras, positioning and/or navigation systems, medical equipment, heads-up displays, wireless devices, audio recording, storage and/or playback devices, video recording, storage and/or playback devices, game platforms, peripherals and/or multi-chip modules. Those skilled in the relevant art(s) would understand that the elements of the invention may be implemented in other types of devices to meet the criteria of a particular application. As used herein, the term “simultaneously” is meant to describe events that share some common time period but the term is not meant to be limited to events that begin at the same point in time, end at the same point in time, or have the same duration.
  • While the invention has been particularly shown and described with reference to the preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made without departing from the scope of the invention.

Claims (20)

1. An apparatus comprising:
a processor having a pipeline; and
a circuit configured to (i) detect a first write instruction in said pipeline that writes to a resource, (ii) stall a read instruction in said pipeline where (a) a first read-after-write conflict exists between said first write instruction and said read instruction and (b) no other write instruction to said resource is scheduled between said first write instruction and said read instruction and (iii) not stall said read instruction due to said first read-after-write conflict where a second write instruction to said resource is scheduled between said first write instruction and said read instruction.
2. The apparatus according to claim 1, wherein said circuit is further configured to examine a determined number of subsequent instructions scheduled after said first instruction.
3. The apparatus according to claim 2, wherein said circuit is further configured to cancel said first read-after-write conflict in response to finding said second write instruction among said subsequent instructions.
4. The apparatus according to claim 2, wherein said determined number matches a stage number in said pipeline where said first instruction writes to said resource.
5. The apparatus according to claim 1, wherein said second write instruction writes to said resource before said read instruction reads from said resource.
6. The apparatus according to claim 1, wherein said first write instruction writes to said resource at a particular stage of said pipeline.
7. The apparatus according to claim 6, wherein (i) said read instruction reads from said resource at a given stage of said pipeline and (ii) said given stage is earlier in said pipeline than said particular stage.
8. The apparatus according to claim 6, wherein (i) said second write instruction writes to said resource at a given stage of said pipeline and (ii) said given stage is earlier in said pipeline than said particular stage.
9. The apparatus according to claim 1, wherein said circuit is further configured to check for a second read-after-write conflict between said second write instruction and said read instruction.
10. The apparatus according to claim 1, wherein said pipeline is an interlocked pipeline.
11. A method for eliminating read-after-write conflicts in a processor, comprising the steps of:
(A) detecting a first write instruction in a pipeline of said processor that writes to a resource;
(B) stalling a read instruction in said pipeline where (i) a first read-after-write conflict exists between said first write instruction and said read instruction and (ii) no other write instruction to said resource is scheduled between said first write instruction and said read instruction; and
(C) not stalling said read instruction due to said first read-after-write conflict where a second write instruction to said resource is scheduled between said first write instruction and said read instruction.
12. The method according to claim 11, further comprising the step of:
examining a determined number of subsequent instructions scheduled after said first instruction.
13. The method according to claim 12, further comprising the step of:
canceling said first read-after-write conflict in response to finding said second write instruction among said subsequent instructions.
14. The method according to claim 12, wherein said determined number matches a stage number in said pipeline where said first instruction writes to said resource.
15. The method according to claim 11, wherein said second write instruction writes to said resource before said read instruction reads from said resource.
16. The method according to claim 11, wherein said first write instruction writes to said resource at a particular stage of said pipeline.
17. The method according to claim 16, wherein (i) said read instruction reads from said resource at a given stage of said pipeline and (ii) said given stage is earlier in said pipeline than said particular stage.
18. The method according to claim 16, wherein (i) said second write instruction writes to said resource at a given stage of said pipeline and (ii) said given stage is earlier in said pipeline than said particular stage.
19. The method according to claim 11, further comprising the step of:
checking for a second read-after-write conflict between said second write instruction and said read instruction.
20. A method for eliminating read-after-write conflicts in a processor, comprising the steps of:
(A) detecting a first write instruction schedule in a pipeline of said processor that writes to a resource;
(B) inserting one or more no-operation instructions before a read instruction where (i) a first read-after-write conflict exists between said first write instruction and said read instruction and (ii) no other write instruction to said resource is scheduled between said first write instruction and said read instruction; and
(C) not inserting said no-operation instructions before said read instruction due to said first read-after-write conflict where a second write instruction to said resource is scheduled between said first write instruction and said read instruction.
US12/855,201 2010-08-12 2010-08-12 Avoiding stall in processor pipeline upon read after write resource conflict when intervening write present Active 2031-10-20 US8499139B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US12/855,201 US8499139B2 (en) 2010-08-12 2010-08-12 Avoiding stall in processor pipeline upon read after write resource conflict when intervening write present

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US12/855,201 US8499139B2 (en) 2010-08-12 2010-08-12 Avoiding stall in processor pipeline upon read after write resource conflict when intervening write present

Publications (2)

Publication Number Publication Date
US20120042152A1 true US20120042152A1 (en) 2012-02-16
US8499139B2 US8499139B2 (en) 2013-07-30

Family

ID=45565632

Family Applications (1)

Application Number Title Priority Date Filing Date
US12/855,201 Active 2031-10-20 US8499139B2 (en) 2010-08-12 2010-08-12 Avoiding stall in processor pipeline upon read after write resource conflict when intervening write present

Country Status (1)

Country Link
US (1) US8499139B2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140337393A1 (en) * 2013-05-13 2014-11-13 Amazon Technologies, Inc. Transaction ordering

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6470445B1 (en) * 1999-09-07 2002-10-22 Hewlett-Packard Company Preventing write-after-write data hazards by canceling earlier write when no intervening instruction uses value to be written by the earlier write

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5073855A (en) 1989-06-30 1991-12-17 Bull Hn Information Systems Inc. Resource conflict detection method and apparatus included in a pipelined processing unit
EP1004959B1 (en) 1998-10-06 2018-08-08 Texas Instruments Incorporated Processor with pipeline protection

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6470445B1 (en) * 1999-09-07 2002-10-22 Hewlett-Packard Company Preventing write-after-write data hazards by canceling earlier write when no intervening instruction uses value to be written by the earlier write

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140337393A1 (en) * 2013-05-13 2014-11-13 Amazon Technologies, Inc. Transaction ordering
US9760596B2 (en) * 2013-05-13 2017-09-12 Amazon Technologies, Inc. Transaction ordering
US10872076B2 (en) 2013-05-13 2020-12-22 Amazon Technologies, Inc. Transaction ordering

Also Published As

Publication number Publication date
US8499139B2 (en) 2013-07-30

Similar Documents

Publication Publication Date Title
US11853763B2 (en) Backward compatibility by restriction of hardware resources
US7577825B2 (en) Method for data validity tracking to determine fast or slow mode processing at a reservation station
US6678807B2 (en) System and method for multiple store buffer forwarding in a system with a restrictive memory model
JP2875909B2 (en) Parallel processing unit
CN106406849B (en) Method and system for providing backward compatibility, non-transitory computer readable medium
JP5416223B2 (en) Memory model of hardware attributes in a transactional memory system
US8990786B2 (en) Program optimizing apparatus, program optimizing method, and program optimizing article of manufacture
US8332597B1 (en) Synchronization of external memory accesses in a dataflow machine
US6470445B1 (en) Preventing write-after-write data hazards by canceling earlier write when no intervening instruction uses value to be written by the earlier write
JP7084379B2 (en) Tracking stores and loads by bypassing loadstore units
JP5579694B2 (en) Method and apparatus for managing a return stack
JP2015507254A (en) Programs and computing devices with exceptions for code specialization in computer architectures that support transactions
CN110825437B (en) Method and apparatus for processing data
US8464008B1 (en) Command cancellation channel for read-modify-write operation in a memory
US9158545B2 (en) Looking ahead bytecode stream to generate and update prediction information in branch target buffer for branching from the end of preceding bytecode handler to the beginning of current bytecode handler
US8499139B2 (en) Avoiding stall in processor pipeline upon read after write resource conflict when intervening write present
US9081607B2 (en) Conditional transaction abort and precise abort handling
US9507725B2 (en) Store forwarding for data caches
CN111221573B (en) Management method of register access time sequence, processor, electronic equipment and computer readable storage medium
US20230315471A1 (en) Method and system for hardware-assisted pre-execution
US20160364240A1 (en) Methods and apparatus to optimize instructions for execution by a processor
US6401195B1 (en) Method and apparatus for replacing data in an operand latch of a pipeline stage in a processor during a stall
US20090031118A1 (en) Apparatus and method for controlling order of instruction
WO2021037124A1 (en) Task processing method and task processing device
CN117270972B (en) Instruction processing method, device, equipment and medium

Legal Events

Date Code Title Description
AS Assignment

Owner name: LSI CORPORATION, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:DUBROVIN, LEONID;RABINOVITCH, ALEXANDER;MARGOLIN, HAGIT;AND OTHERS;SIGNING DATES FROM 20100810 TO 20100828;REEL/FRAME:025033/0271

STCF Information on status: patent grant

Free format text: PATENTED CASE

AS Assignment

Owner name: DEUTSCHE BANK AG NEW YORK BRANCH, AS COLLATERAL AG

Free format text: PATENT SECURITY AGREEMENT;ASSIGNORS:LSI CORPORATION;AGERE SYSTEMS LLC;REEL/FRAME:032856/0031

Effective date: 20140506

FEPP Fee payment procedure

Free format text: PAYER NUMBER DE-ASSIGNED (ORIGINAL EVENT CODE: RMPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

AS Assignment

Owner name: INTEL CORPORATION, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:LSI CORPORATION;REEL/FRAME:035090/0477

Effective date: 20141114

AS Assignment

Owner name: LSI CORPORATION, CALIFORNIA

Free format text: TERMINATION AND RELEASE OF SECURITY INTEREST IN PATENTS AT REEL/FRAME NO. 32856/0031;ASSIGNOR:DEUTSCHE BANK AG NEW YORK BRANCH;REEL/FRAME:035797/0943

Effective date: 20150420

AS Assignment

Owner name: LSI CORPORATION, CALIFORNIA

Free format text: TERMINATION AND RELEASE OF SECURITY INTEREST IN PATENT RIGHTS (RELEASES RF 032856-0031);ASSIGNOR:DEUTSCHE BANK AG NEW YORK BRANCH, AS COLLATERAL AGENT;REEL/FRAME:037684/0039

Effective date: 20160201

Owner name: AGERE SYSTEMS LLC, PENNSYLVANIA

Free format text: TERMINATION AND RELEASE OF SECURITY INTEREST IN PATENT RIGHTS (RELEASES RF 032856-0031);ASSIGNOR:DEUTSCHE BANK AG NEW YORK BRANCH, AS COLLATERAL AGENT;REEL/FRAME:037684/0039

Effective date: 20160201

AS Assignment

Owner name: BEIJING XIAOMI MOBILE SOFTWARE CO., LTD., CHINA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:INTEL CORPORATION;REEL/FRAME:037733/0440

Effective date: 20160204

FPAY Fee payment

Year of fee payment: 4

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8