WO2000070483A2 - Procede et appareil de segmentation et de reassemblage d'un processeur pipeline - Google Patents
Procede et appareil de segmentation et de reassemblage d'un processeur pipeline Download PDFInfo
- Publication number
- WO2000070483A2 WO2000070483A2 PCT/US2000/013221 US0013221W WO0070483A2 WO 2000070483 A2 WO2000070483 A2 WO 2000070483A2 US 0013221 W US0013221 W US 0013221W WO 0070483 A2 WO0070483 A2 WO 0070483A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- stage
- instruction
- pipeline
- processor
- stalled
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 97
- 230000011218 segmentation Effects 0.000 title abstract description 7
- 238000013461 design Methods 0.000 claims abstract description 62
- 238000012545 processing Methods 0.000 claims abstract description 32
- 230000002194 synthesizing effect Effects 0.000 claims abstract description 14
- 230000015572 biosynthetic process Effects 0.000 claims description 29
- 238000003786 synthesis reaction Methods 0.000 claims description 28
- 230000008569 process Effects 0.000 claims description 23
- 230000015654 memory Effects 0.000 claims description 15
- 238000013515 script Methods 0.000 claims description 15
- 238000004590 computer program Methods 0.000 claims description 10
- 238000003860 storage Methods 0.000 claims description 8
- 238000013500 data storage Methods 0.000 claims description 6
- 238000004088 simulation Methods 0.000 claims description 6
- 230000004044 response Effects 0.000 claims description 2
- 238000001308 synthesis method Methods 0.000 abstract 1
- 230000006870 function Effects 0.000 description 11
- 230000007704 transition Effects 0.000 description 11
- 238000010586 diagram Methods 0.000 description 10
- 238000013459 approach Methods 0.000 description 8
- 230000002829 reductive effect Effects 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 239000004065 semiconductor Substances 0.000 description 5
- 230000001419 dependent effect Effects 0.000 description 4
- 230000002093 peripheral effect Effects 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 230000003111 delayed effect Effects 0.000 description 3
- 238000012938 design process Methods 0.000 description 3
- 238000004519 manufacturing process Methods 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 241000761456 Nops Species 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000003466 anti-cipated effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000002939 deleterious effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 108010020615 nociceptin receptor Proteins 0.000 description 1
- 230000036961 partial effect Effects 0.000 description 1
- 238000010926 purge Methods 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 229910052710 silicon Inorganic materials 0.000 description 1
- 239000010703 silicon Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3867—Concurrent instruction execution, e.g. pipeline or look ahead using instruction pipelines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/30—Circuit design
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3838—Dependency mechanisms, e.g. register scoreboarding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3854—Instruction completion, e.g. retiring, committing or graduating
- G06F9/3858—Result writeback, i.e. updating the architectural state or memory
Definitions
- the present invention relates to the field of integrated circuit design, specifically to the use of a hardware description language (HDL) for implementing instructions in a pipelined central processing unit (CPU) or user-customizable microprocessor.
- HDL hardware description language
- RISC reduced instruction set computer
- RISC processors are well known in the computing arts.
- RISC processors generally have the fundamental characteristic of utilizing a substantially reduced instruction set as compared to non-RISC (commonly known as "CISC") processors.
- CISC non-RISC
- RISC processor machine instructions are not all micro- coded, but rather may be executed immediately without decoding, thereby affording significant economies in terms of processing speed.
- This "streamlined" instruction handling capability furthermore allows greater simplicity in the design of the processor (as compared to non-RISC devices), thereby allowing smaller silicon and reduced cost of fabrication.
- RISC processors are also typically characterized by (i) load/store memory architecture (i.e., only the load and store instructions have access to memory; other instructions operate via internal registers within the processor); (ii) unity of processor and compiler; and (iii) pipelining.
- Pipelining is a technique for increasing the performance of processor by dividing the sequence of operations within the processor into discrete components which are effectively executed in parallel when possible.
- the arithmetic units associated with processor arithmetic operations (such as ADD, MULTIPLY, DIVIDE, etc.) are usually "segmented", so that a specific portion of the operation is performed in a given component of the unit during any clock cycle.
- Fig. 1 illustrates a typical processor architecture having such segmented arithmetic units. Hence, these units can operate on the results of a different calculation at any given clock cycle.
- two numbers A and B are fed to the multiplier unit 10 and partially processed by the first segment 12 of the unit.
- the partial results from multiplying A and B are passed to the second segment 14 while the first segment 12 receives two new numbers (say C and D) to start processing.
- the net result is that after an initial startup period, one multiplication operation is performed by the arithmetic unit 10 every clock cycle.
- the depth of the pipeline may vary from one architecture to another.
- depth refers to the number of discrete stages present in the pipeline.
- a pipeline with more stages executes programs faster but may be more difficult to program if the pipeline effects are visible to the programmer.
- Most pipelined processors are either three stage (instruction fetch, decode, and execute) or four stages (such as instruction fetch, decode, operand fetch, and execute, or alternatively instruction fetch, decode/operand fetch, execute, and writeback), although more or less stages may be used.
- instruction fetch, decode, and execute or alternatively instruction fetch, decode/operand fetch, execute, and writeback
- instructions in one stage generally follow immediately after instructions in a later stage with a minimum of blank slots, NOP codes, or the like. Furthermore, when an instruction at a later stage is stalled (such as when an instruction in the execution stage is awaiting information from a fetch operation), the earlier and later stages of the pipeline are also stalled. In this manner, the pipeline tends to operate largely in "lock-step" fashion.
- hazards When developing the instruction set of a pipelined processor, several different types of "hazards" must be considered. For example, so called “structural” or “resource contention” hazards arise from overlapping instructions competing for the same resources (such as busses, registers, or other functional units) which are typically resolved using one or more pipeline stalls. So-called “data” pipeline hazards occur in the case of read/write conflicts which may change the order of memory or register accesses. "Control” hazards are generally produced by branches or similar changes in program flow. Interlocks are generally necessary with pipelined architectures to address many of these hazards. For example, consider the case where a following instruction (n +1) in an earlier pipeline stage needs the result of the instruction n from a later stage.
- a simple solution to the aforementioned problem is to delay the operand calculation in the instruction decoding phase by one or more clock cycles.
- a result of such delay is that the execution time of a given instruction on the processor is in part determined by the instructions surrounding it within the pipeline. This complicates optimization of the code for the processor, since it is often difficult for the programmer to spot interlock situations within the code.
- "Scoreboarding" may be used in the processor to implement interlocks; in this approach, a bit is attached to each processor register to act as an indicator of the register content; specifically, whether (i) the contents of the register have been updated and are therefore ready for use, or (ii) the contents are undergoing modification such as being written to by another process.
- This scoreboard is also used in some architectures to generate interlocks which prevent instructions which are dependent upon the contents of the scoreboarded register from executing until the scoreboard indicates that the register is ready.
- This type of approach is referred to as "hardware" interlocking, since the interlock is invoked purely through examination of the scoreboard via hardware within the processor.
- Such interlocks generate "stalls" which preclude the data dependent instruction from executing (thereby stalling the pipeline) until the register is ready.
- NOPs no-operation opcodes
- NOPs no-operation opcodes
- This later approach has been referred to as "software" interlocking, and has the disadvantage of increasing the code size and complexity of programs that employ instructions that require interlocking. Heavily software interlocked designs also tend not to be fully optimized in terms of their code structures.
- Branching refers to the condition where program flow is interrupted or altered. Other operations such as loop setup and subroutine call instructions also interrupt or alter program flow in a similar fashion.
- the term "jump delay slot” is often used to refer to the slot within a pipeline subsequent to a branching or jump instruction being decoded. The instruction after the branch (or load) is executed while awaiting completion of the branch/load instruction. Branching may be conditional (i.e., based on the truth or value of one or more parameters) or unconditional. It may also be absolute (e.g., based on an absolute memory address), or relative (e.g., based on relative addresses and independent of any particular memory address).
- Branching can have a profound effect on pipelined systems.
- a branch instruction is inserted and decoded by the processor's instruction decode stage (indicating that the processor must begin executing a different address)
- the next instruction word in the instruction sequence has been fetched and inserted into the pipeline.
- One solution to this problem is to purge the fetched instruction word and halt or stall further fetch operations until the branch instruction has been executed, as illustrated in Fig. 2.
- This approach results in the execution of the branch instruction in several instruction cycles, typically equal to the depth of the pipeline employed in the processor design. This result is deleterious to processor speed and efficiency, since other operations can not be conducted by the processor during this period.
- a delayed branch approach may be employed.
- the pipeline is not purged when a branch instruction reaches the decode stage, but rather subsequent instructions present in the earlier stages of the pipeline are executed normally before the branch is executed.
- the branch appears to be delayed by the number of instruction cycles necessary to execute all subsequent instructions in the pipeline at the time the branch instruction is decoded.
- processor designers and programmers must carefully weigh the tradeoffs associated with utilizing hardware or software interlocks as opposed to a non- interlock architecture. Furthermore, the interaction of branching instructions (and delayed or multi-cycle branching) in the instruction set with the selected interlock scheme must be considered.
- the present invention satisfies the aforementioned needs by providing an improved method and apparatus for executing instructions within a pipelined processor architecture.
- an improved method of controlling the operation of one or more pipelines within a processor is disclosed.
- a method of pipeline segmentation is disclosed whereby (i) instructions in stages prior to a stalled stage are also stalled, and (ii) instructions in stages subsequent to the stalled instruction are permitted to complete.
- a discontinuity or "tear" in the pipeline is purposely created.
- a blank slot or NOP is inserted into the subsequent stage of the pipeline to preclude the executed instruction present in the torn stage from being executed multiple times.
- NOP is inserted into the subsequent stage of the pipeline to preclude the executed instruction present in the torn stage from being executed multiple times.
- NOP a method is disclosed which permits instructions otherwise stalled at earlier stages in the pipeline to be re-assembled with ("catch up" to) later stalled stages, thereby effectively repairing any tear or existing pipeline discontinuity.
- an improved method of synthesizing the design of an integrated circuit incorporating the aforementioned jump delay slot method comprises obtaining user input regarding the design configuration; creating customized HDL functional blocks based on the user's input and existing library of functions; determining the design hierarchy based on the user's input and the library and generating a hierarchy file, new library file, and makefile; running the makefile to create the structural HDL and scripts; running the generated scripts to create a makefile for the simulator and a synthesis script; and synthesizing the design based on the generated design and synthesis script.
- an improved computer program useful for synthesizing processor designs and embodying the aforementioned methods comprises an object code representation stored on the magnetic storage device of a microcomputer, and adapted to run on the central processing unit thereof.
- the computer program further comprises an interactive, menu-driven graphical user interface (GUI), thereby facilitating ease of use.
- GUI graphical user interface
- an improved apparatus for running the aforementioned computer program used for synthesizing logic associated with pipelined processors comprises a stand-alone microcomputer system having a display, central processing unit, data storage device(s), and input device.
- an improved processor architecture utilizing the foregoing pipeline tearing and catch-up methodologies is disclosed.
- the processor comprises a reduced instruction set computer (RISC) having a three stage pipeline comprising instruction fetch, decode, and execute stages which are controlled in part by the aforementioned pipeline tearing/catch-up modes.
- RISC reduced instruction set computer
- Synthesized gate logic both constrained and unconstrained, is also disclosed.
- Fig. 1 is block diagram of a typical prior art processor architecture employing "segmented" arithmetic units.
- Fig. 2 illustrates graphically the operation of a prior art four stage pipelined processor undergoing a multi-cycle branch operation.
- Fig. 3 is a pipeline flow diagram illustrating the concept of "tearing" in a multistage pipeline according to the present invention.
- Fig. 4 is a logical flow diagram illustrating the generalized methodology of controlling a pipeline using "tearing" according to the present invention.
- Fig. 5 is a pipeline flow diagram illustrating the concept of "catch-up" in a multistage pipeline according to the present invention.
- Fig. 6 is a logical flow diagram illustrating the generalized methodology of controlling a pipeline using "catch-up" according to the present invention.
- Fig. 7 is a logical flow diagram illustrating the generalized methodology of synthesizing processor logic which incorporates pipeline tearing/catch-up modes according to the present invention.
- Figs. 8a-8b are schematic diagrams illustrating one exemplary embodiment of gate logic implementing the pipeline "tearing" functionality of the invention (unconstrained and constrained, respectively), synthesized using the method of Fig. 7.
- Figs. 8c-8d are schematic diagrams illustrating one exemplary embodiment of gate logic implementing the pipeline "catch-up" functionality of the invention (unconstrained and constrained, respectively), synthesized using the method of Fig. 7.
- Fig. 9 is a block diagram of a processor design incorporating pipeline tearing/catchup modes according to the present invention.
- Fig. 10 is a functional block diagram of a computing device using a computer program incorporating the methodology of Fig. 7 to synthesize a pipelined processor design.
- processor is meant to include any integrated circuit or other electronic device capable of performing an operation on at least one instruction word including, without limitation, reduced instruction set core (RISC) processors such as the ARC user-configurable core manufactured by the Assignee hereof, central processing units (CPUs), and digital signal processors (DSPs).
- RISC reduced instruction set core
- CPUs central processing units
- DSPs digital signal processors
- stage refers to various successive stages within a pipelined processor; i.e., stage 1 refers to the first pipelined stage, stage 2 to the second pipelined stage, etc. While the following discussion is cast in terms of a three stage pipeline (i.e., instruction fetch, decode, and execution stages), it will be appreciated that the methodology and apparatus disclosed herein are broadly applicable to processor architectures with one or more pipelines having more or less than three stages.
- VHSIC hardware description language VHSIC hardware description language
- Nerilog® hardware description languages
- Nerilog® hardware description languages
- an exemplary Synopsys® synthesis engine such as the Design Compiler 1999.05 (DC99) is used to synthesize the various embodiments set forth herein
- other synthesis engines such as Buildgates® available from Cadence Design Systems, Inc., may be used.
- IEEE std. 1076.3-1997 IEEE Standard NHDL Synthesis Packages specify an industry- accepted language for specifying a Hardware Definition Language-based design and the synthesis capabilities that may be expected to be available to one of ordinary skill in the art.
- the architecture of the present invention includes a generally free-flowing pipeline. If a stage in the pipeline is stalled, then the previous stages will also be stalled if they contain instructions. However, despite the stalling of these previous stages, there are several advantages to having later (i.e., "downstream") stages in the pipeline continue, if no interlocks are otherwise applied.
- Fig. 3 graphically illustrates this principle (assuming no interlocks are applied).
- the first step 402 of the method 400 comprises generating an instruction set comprising a plurality of instruction words to be run on the processor.
- This instruction set is typically stored in an on-chip program storage device (such as a program RAM or ROM memory) of the type well known in the art, although other types of device, including off-chip memory, may be used.
- the generation of the instruction set itself is also well known in the art, except to the extent that it is modified to include the pipeline tearing functionality, such modification being described in greater detail below.
- step 404 the instruction set (program) is sequentially fetched from the storage device in the designated sequence by, inter alia, the program counter (PC) and run on the processor, with the fetched instructions being sequentially processed within the various stages of the pipeline.
- PC program counter
- load/store instructions may access program memory space, hence, a plurality of intermediate registers are employed in such processor to physically receive and hold instruction information fetched from the program memory.
- load/store architecture and use of register structures within a processor are well known in the art, and accordingly will not be described further herein.
- a stall condition in one stage of the pipeline is detected by logic blocks which combine signals to determine if a conflict is taking place, typically for access to a data value or other resource.
- An example of this is the detection of the condition when a register being read by an instruction register is marked as 'scoreboarded' meaning that the processor must wait until the register is updated with a new value.
- Another example is when stall cycles are generated by a state machine whilst a multicycle operation (for example a shift and add multiply) is carried out.
- a "valid instruction” is one which is not marked as “invalid” for any reason (step 410), and which has successfully completed processing in the prior (Nth) stage (step 412).
- the "p3iv" signal i.e., "stage 3 instruction valid” is used to indicate that stage 3 of the pipeline contains a valid instruction.
- the instruction in stage 3 may not be valid for a number of reasons, including: 1.
- the instruction was marked as invalid when it moved into stage 2 (i.e., p2iv
- stage 3 has been marked as invalid by the pipeline tearing logic on a previous cycle, but has not subsequently been replaced by an instruction moving into stage 2 from stage 3.
- step 412 determines whether the instruction present at stage 2 is determined in step 412 not to have been able to complete processing (Item 2. above)
- the instruction at stage 3 is able to complete processing
- An alternative method is to insert a NOP or other blank instruction into stage 3, and mark stage 3 as valid. If this blank is not inserted or the stage marked invalid, the instruction which was executed in stage 3 at the time the instruction in stage 2 could not complete processing will be executed again on the next instruction cycle, which is not desired.
- step 418 the valid instruction present in stage 3 (and subsequent stages in a pipeline having four or more stages) is executed on the next clock cycle while maintaining the instruction present in stage 2 stalled in that stage. Note that on subsequent clock cycles, processing of the stalled instruction in stage 2 may occur, dependent on the status of the stall/interlock signal causing the stall. Once the stall interlock signal is disabled, processing of the stalled instruction in that stage will proceed at the leading edge of the next instruction cycle.
- the following exemplary code, extracted from Appendix I hereto, is used in conjunction with Applicant's ARC Core (three stage pipeline variant) to implement the "tearing" functionality previously described:
- the present invention also employs mechanisms to address the reverse situation; i.e., allowing earlier stages of the pipeline to continue processing or "catch-up" to the later stages when empty slots or spaces are present between the stages, or the pipeline has otherwise been "torn". This function is also known as "pipeline transition enable.”
- stage 1 is permitted to catch-up to stage 2 on the clock edge by allowing continued processing of the stage 1 instruction until completion, at which point it is advanced into stage 2, and a new instruction is advanced into stage 1.
- Fig. 5 illustrates this concept graphically.
- Fig. 6 the method of controlling a multi-stage processor pipeline using the "catch-up" technique of the present invention is described.
- a valid instruction is defined simply as one which has not been marked as invalid when it moved into it's current stage (stage 2 in the present example).
- the pipeline transition enable signal is set "true" per step 610 as discussed in greater detail below.
- the pipeline transition enable signal described controls the transition of an instruction word from stage 1 into stage 2.
- a pipeline 'catch-up' would occur in this event if the instruction in stage 3 were not able to complete processing.
- the invalid slot in stage 2 would be replaced by an advancing instruction from stage 1 , whilst the instruction at stage 3 would remain at stage 3.
- the transition enable signal is again set "false”, thereby again precluding the valid instruction in stage 2 from being replaced, since the valid (yet uncompleted) instruction will not advance to stage 3 upon the next cycle. If the valid instruction in stage 2 capable of completing on the next cycle, and is not waiting for a pending fetch, the transition enable signal is set to "true" per step 610, thereby permitting the stage 1 instruction to advance to stage 2, at the same time as the instruction in stage 2 moves into stage 3.
- the pipeline transition enable signal is set “true” at all times when the processor is running except when: (i) a valid instruction in stage 2 cannot complete for some reason; or (ii) if an interrupt in stage 2 is waiting for a pending instruction fetch to complete. It is noted that if an invalid instruction in stage 2 is held (due to, inter alia, a stall at stage 3) then the transition enable signal will be set "true” and allow the instruction in stage 1 to move into stage 2. Hence, the invalid stage 2 instruction will be replaced by the valid stage 1 instruction.
- the "catch-up" or pipeline transition enable signal (enl) of the present invention may be, in one embodiment, generated using the following exemplary code (extracted from Appendix II) hereto:
- pipeline tearing and catch-up methods of the present invention may be used in conjunction with (either alone or collectively) other methods of pipeline control and interlock including, inter alia, those disclosed in Applicant's co-pending U.S. Patent Application entitled “Method And Apparatus For Jump Control In A Pipelined Processor,” as well as those disclosed in Applicant's co-pending U.S. Patent Application “Method And Apparatus For Jump Delay Slot Control In A Pipelined Processor,” both filed contemporaneously herewith, both being incorporated by reference herein in their entirety.
- various register encoding schemes such as the "loose" register encoding described in Applicant's co-pending U.S.
- Patent Application entitled “Method and Apparatus for Loose Register Encoding Within a Pipelined Processor” filed contemporaneously herewith and incorporated by reference in its entirety herein, may be used in conjunction with the pipeline tearing and/or catch-up inventions described herein.
- step 702. user input is obtained regarding the design configuration in the first step 702. Specifically, desired modules or functions for the design are selected by the user, and instructions relating to the design are added, subtracted, or generated as necessary. For example, in signal processing applications, it is often advantageous for CPUs to include a single "multiply and accumulate" (MAC) instruction.
- the instruction set of the synthesized design is modified so as to incorporate the foregoing pipeline tearing and/or catch-up modes (or another comparable pipeline control architecture) therein.
- the technology library location for each VHDL file is also defined by the user in step 702.
- the technology library files in the present invention store all of the information related to cells necessary for the synthesis process, including for example logical function, input/output timing, and any associated constraints. In the present invention, each user can define his/her own library name and location(s), thereby adding further flexibility.
- step 703 the user creates customized HDL functional blocks based on the user's input and the existing library of functions specified in step 702.
- step 704 the design hierarchy is determined based on the user's input and the aforementioned library files.
- a hierarchy file, new library file, and makefile are subsequently generated based on the design hierarchy.
- makefile refers to the commonly used UNIX makefile function or similar function of a computer system well known to those of skill in the computer programming arts.
- the makefile function causes other programs or algorithms resident in the computer system to be executed in the specified order.
- it further specifies the names or locations of data files and other information necessary to the successful operation of the specified programs. It is noted, however, that the invention disclosed herein may utilize file structures other than the "makefile” type to produce the desired functionality.
- the user is interactively asked via display prompts to input information relating to the desired design such as the type of "build” (e.g., overall device or system configuration), width of the external memory system data bus, different types of extensions, cache type/size, etc.
- type of "build” e.g., overall device or system configuration
- width of the external memory system data bus e.g., width of the external memory system data bus
- different types of extensions e.g., cache type/size, etc.
- step 706 the user runs the makefile generated in step 704 to create the structural HDL.
- This structural HDL ties the discrete functional block in the design together so as to make a complete design.
- step 708 the script generated in step 706 is run to create a makefile for the simulator.
- the user also runs the script to generate a synthesis script in step 708.
- step 702 the process steps beginning with step 702 are re-performed until an acceptable design is achieved. In this fashion, the method 700 is iterative.
- FIGs. 8a-8b one embodiment of exemplary gate logic (including the "p3iv" signal referenced in the VHDL of Appendix I) synthesized using the aforementioned Synopsys® Design Compiler and methodology of Fig. 7 is illustrated. Note that during the synthesis process used to generate the logic of Fig. 8a, an LSI 10k l.Oum process was specified, and no constraints were placed on the design. With respect to the logic of Fig. 8b, the same process was used; however, the design was constrained on the path from len3 to the clock. Appendix III contains the coding used to generate the exemplary logic of FIGURES. 8a-8b.
- FIGs. 8c-8d one embodiment of exemplary gate logic (including the "ienl" signal referenced in the VHDL of Appendix II) synthesized using methodology of Fig. 7 is illustrated. Note that during the synthesis process used to generate the logic of Fig. 8c, an LSI 10k l.Oum process was specified, and no constraints were placed on the design. With respect to the logic of Fig. 8d, the same process was used; however, the design was constrained to preclude the use of AND-OR gates. Appendix IV contains the coding used to generate the exemplary logic of FIGURES. 8c-8d.
- Fig. 9 illustrates an exemplary pipelined processor fabricated using a 1.0 urn process and incorporating the pipeline tearing and catch-up modes previously described herein.
- the processor 900 is an ARC microprocessor-like CPU device having, inter alia, a processor core 902, on-chip memory 904, and an external interface 906.
- the device is fabricated using the customized VHDL design obtained using the method
- the processor of Figure 9 may contain any commonly available peripheral such as serial communications devices, parallel ports, timers, counters, high current drivers, analog to digital (A/D) converters, digital to analog converters (D/A), interrupt processors, LCD drivers, memories and other similar devices. Further, the processor may also include custom or application specific circuitry.
- the present invention is not limited to the type, number or complexity of peripherals and other circuitry that may be combined using the method and apparatus. Rather, any limitations are imposed by the physical capacity of the extant semiconductor processes which improve over time. Therefore it is anticipated that the complexity and degree of integration possible employing the present invention will further increase as semiconductor processes improve.
- the computing device 1000 comprises a motherboard 1001 having a central processing unit (CPU) 1002, random access memory (RAM) 1004, and memory controller 1005.
- CPU central processing unit
- RAM random access memory
- a storage device 1006 (such as a hard disk drive or CD-ROM), input device 1007 (such as a keyboard or mouse), and display device 1008 (such as a CRT, plasma, or TFT display), as well as buses necessary to support the operation of the host and peripheral components, are also provided.
- the aforementioned VHDL descriptions and synthesis engine are stored in the form of an object code representation of a computer program in the RAM 1004 and/or storage device 1006 for use by the CPU 1002 during design synthesis, the latter being well known in the computing arts.
- the user (not shown) synthesizes logic designs by inputting design configuration specifications into the synthesis program via the program displays and the input device 1007 during system operation. Synthesized designs generated by the program are stored in the storage device 1006 for later retrieval, displayed on the graphic display device 1008, or output to an external device such as a printer, data storage unit, other peripheral component via a serial or parallel port 1012 if desired.
- entity v007a is port ( ck in std_ulogic; clr in std_ulogic; ien2 in std_ulogic; ien3 in std_ulogic; ip2iv in std_ulogic; p3iv out std_ulogic); end v007a;
- n_p3iv std_ulogic
- ip3iv std_ulogic
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Evolutionary Computation (AREA)
- Geometry (AREA)
- Advance Control (AREA)
Abstract
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP00930715A EP1190337A2 (fr) | 1999-05-13 | 2000-05-12 | Procede et appareil de segmentation et de reassemblage d'un processeur pipeline |
AU48487/00A AU4848700A (en) | 1999-05-13 | 2000-05-12 | Method and apparatus for processor pipeline segmentation and re-assembly |
Applications Claiming Priority (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13425399P | 1999-05-13 | 1999-05-13 | |
US60/134,253 | 1999-05-13 | ||
US09/418,663 | 1999-10-14 | ||
US09/418,663 US6862563B1 (en) | 1998-10-14 | 1999-10-14 | Method and apparatus for managing the configuration and functionality of a semiconductor design |
US52417900A | 2000-03-13 | 2000-03-13 | |
US09/524,179 | 2000-03-13 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2000070483A2 true WO2000070483A2 (fr) | 2000-11-23 |
WO2000070483A3 WO2000070483A3 (fr) | 2001-08-09 |
Family
ID=27384547
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2000/013221 WO2000070483A2 (fr) | 1999-05-13 | 2000-05-12 | Procede et appareil de segmentation et de reassemblage d'un processeur pipeline |
Country Status (5)
Country | Link |
---|---|
EP (1) | EP1190337A2 (fr) |
CN (1) | CN1217261C (fr) |
AU (1) | AU4848700A (fr) |
TW (1) | TW589544B (fr) |
WO (1) | WO2000070483A2 (fr) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2004023292A2 (fr) * | 2002-09-06 | 2004-03-18 | Mips Technologies, Inc. | Procede et appareil permettant d'eliminer des aleas au moyen d'instructions de saut |
CN102194350A (zh) * | 2011-03-24 | 2011-09-21 | 大连理工大学 | 一种基于vhdl的cpu |
US8386972B2 (en) | 1998-10-14 | 2013-02-26 | Synopsys, Inc. | Method and apparatus for managing the configuration and functionality of a semiconductor design |
TWI408601B (zh) * | 2008-08-21 | 2013-09-11 | Toshiba Kk | 管線操作處理器及控制系統 |
US8688879B2 (en) | 2000-03-10 | 2014-04-01 | Synopsys, Inc. | Memory interface and method of interfacing between functional entities |
US9003166B2 (en) | 2006-12-01 | 2015-04-07 | Synopsys, Inc. | Generating hardware accelerators and processor offloads |
US9971516B2 (en) | 2016-10-17 | 2018-05-15 | International Business Machines Corporation | Load stall interrupt |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN100451951C (zh) * | 2006-01-26 | 2009-01-14 | 深圳艾科创新微电子有限公司 | Risc cpu中的5+3级流水线设计方法 |
CN102830953B (zh) * | 2012-08-02 | 2017-08-25 | 中兴通讯股份有限公司 | 指令处理方法及网络处理器指令处理装置 |
CN104793987B (zh) * | 2014-01-17 | 2018-08-03 | 中国移动通信集团公司 | 一种数据处理方法及装置 |
CN111399912B (zh) * | 2020-03-26 | 2022-11-22 | 超睿科技(长沙)有限公司 | 一种面向多周期指令的指令调度方法、系统及介质 |
CN113961247B (zh) * | 2021-09-24 | 2022-10-11 | 北京睿芯众核科技有限公司 | 一种基于risc-v处理器的向量存/取指令执行方法、系统及装置 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0352103A2 (fr) * | 1988-07-20 | 1990-01-24 | Digital Equipment Corporation | Ecrasement des bulles du pipeline dans un système de calcul |
EP0649085A1 (fr) * | 1993-10-18 | 1995-04-19 | Cyrix Corporation | Contrôle de pipeline et traduction de régistre pour microprocesseur |
US5809320A (en) * | 1990-06-29 | 1998-09-15 | Digital Equipment Corporation | High-performance multi-processor having floating point unit |
-
2000
- 2000-05-12 WO PCT/US2000/013221 patent/WO2000070483A2/fr active Application Filing
- 2000-05-12 AU AU48487/00A patent/AU4848700A/en not_active Abandoned
- 2000-05-12 CN CN008084580A patent/CN1217261C/zh not_active Expired - Fee Related
- 2000-05-12 EP EP00930715A patent/EP1190337A2/fr not_active Withdrawn
- 2000-07-05 TW TW089109198A patent/TW589544B/zh not_active IP Right Cessation
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0352103A2 (fr) * | 1988-07-20 | 1990-01-24 | Digital Equipment Corporation | Ecrasement des bulles du pipeline dans un système de calcul |
US5809320A (en) * | 1990-06-29 | 1998-09-15 | Digital Equipment Corporation | High-performance multi-processor having floating point unit |
EP0649085A1 (fr) * | 1993-10-18 | 1995-04-19 | Cyrix Corporation | Contrôle de pipeline et traduction de régistre pour microprocesseur |
Non-Patent Citations (2)
Title |
---|
"CONDITION REGISTER COHERENCY LOOK-AHEAD" RESEARCH DISCLOSURE,GB,INDUSTRIAL OPPORTUNITIES LTD. HAVANT, no. 348, 1 April 1993 (1993-04-01), page 243 XP000304185 ISSN: 0374-4353 * |
DIEFENDORFF K ET AL: "ORGANIZATION OF THE MOTOROLA 88110 SUPERSCALAR RISC MICROPROCESSOR" IEEE MICRO,US,IEEE INC. NEW YORK, vol. 12, no. 2, 1 April 1992 (1992-04-01), pages 40-63, XP000266192 ISSN: 0272-1732 * |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8386972B2 (en) | 1998-10-14 | 2013-02-26 | Synopsys, Inc. | Method and apparatus for managing the configuration and functionality of a semiconductor design |
US9418042B2 (en) | 2000-03-10 | 2016-08-16 | Synopsys, Inc. | Memory interface and method of interfacing between functional entities |
US8688879B2 (en) | 2000-03-10 | 2014-04-01 | Synopsys, Inc. | Memory interface and method of interfacing between functional entities |
US8959269B2 (en) | 2000-03-10 | 2015-02-17 | Synopsys, Inc. | Memory interface and method of interfacing between functional entities |
WO2004023292A3 (fr) * | 2002-09-06 | 2004-11-11 | Mips Tech Inc | Procede et appareil permettant d'eliminer des aleas au moyen d'instructions de saut |
US7000095B2 (en) | 2002-09-06 | 2006-02-14 | Mips Technologies, Inc. | Method and apparatus for clearing hazards using jump instructions |
US8171262B2 (en) | 2002-09-06 | 2012-05-01 | Mips Technology, Inc. | Method and apparatus for clearing hazards using jump instructions |
WO2004023292A2 (fr) * | 2002-09-06 | 2004-03-18 | Mips Technologies, Inc. | Procede et appareil permettant d'eliminer des aleas au moyen d'instructions de saut |
US9690630B2 (en) | 2006-12-01 | 2017-06-27 | Synopsys, Inc. | Hardware accelerator test harness generation |
US9003166B2 (en) | 2006-12-01 | 2015-04-07 | Synopsys, Inc. | Generating hardware accelerators and processor offloads |
TWI408601B (zh) * | 2008-08-21 | 2013-09-11 | Toshiba Kk | 管線操作處理器及控制系統 |
CN102194350A (zh) * | 2011-03-24 | 2011-09-21 | 大连理工大学 | 一种基于vhdl的cpu |
US9971516B2 (en) | 2016-10-17 | 2018-05-15 | International Business Machines Corporation | Load stall interrupt |
US10592116B2 (en) | 2016-10-17 | 2020-03-17 | International Business Machines Corporation | Load stall interrupt |
Also Published As
Publication number | Publication date |
---|---|
WO2000070483A3 (fr) | 2001-08-09 |
CN1217261C (zh) | 2005-08-31 |
EP1190337A2 (fr) | 2002-03-27 |
CN1355900A (zh) | 2002-06-26 |
AU4848700A (en) | 2000-12-05 |
TW589544B (en) | 2004-06-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6560754B1 (en) | Method and apparatus for jump control in a pipelined processor | |
US7000095B2 (en) | Method and apparatus for clearing hazards using jump instructions | |
US7010558B2 (en) | Data processor with enhanced instruction execution and method | |
US20030070013A1 (en) | Method and apparatus for reducing power consumption in a digital processor | |
WO2000070483A2 (fr) | Procede et appareil de segmentation et de reassemblage d'un processeur pipeline | |
US20020032558A1 (en) | Method and apparatus for enhancing the performance of a pipelined data processor | |
EP1190305B1 (fr) | Procede et appareil de controle d'emplacement de temporisation de branchement dans un processeur pipeline | |
JPH07120284B2 (ja) | データ処理装置 | |
US20060168431A1 (en) | Method and apparatus for jump delay slot control in a pipelined processor | |
WO2000070446A2 (fr) | Procede et appareil d'encodage de registre libre dans un processeur pipeline | |
EP1190303B1 (fr) | Procede et dispositif de commande de saut dans un processeur pipeline | |
Tahar et al. | A practical methodology for the formal verification of RISC processors | |
KR102379886B1 (ko) | 벡터 명령 처리 | |
Steven et al. | iHARP: a multiple instruction issue processor | |
WO2002057893A2 (fr) | Procede et appareil de reduction de la consommation d'energie dans un processeur numerique | |
JPH0384632A (ja) | データ処理装置 | |
JP2000029696A (ja) | プロセッサおよびパイプライン処理制御方法 | |
LaForest | Second-generation stack computer architecture | |
Hoseininasab et al. | Rapid Prototyping of Complex Micro-architectures Through High-Level Synthesis | |
Pulka et al. | Multithread RISC architecture based on programmable interleaved pipelining | |
Lutsyk et al. | Pipelining | |
Carmona et al. | Implementation of a fully pipelined ARM compatible microprocessor core | |
JP2785820B2 (ja) | 並列処理装置 | |
Casu et al. | Coupling latency-insensitivity with variable-latency for better than worst case design: A RISC case study | |
JP2001216154A (ja) | むき出しのパイプラインを具備するコードのサイズを、nop演算を命令オペランドとしてコード化することで削減するための方法並びに装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 00808458.0 Country of ref document: CN |
|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AG AL AM AT AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ CZ DE DE DK DK DM DZ EE EE ES FI FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
AK | Designated states |
Kind code of ref document: A3 Designated state(s): AE AG AL AM AT AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ CZ DE DE DK DK DM DZ EE EE ES FI FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A3 Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2000930715 Country of ref document: EP |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
WWP | Wipo information: published in national office |
Ref document number: 2000930715 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: JP |