WO1997041509A1 - Superscalar microprocessor including a high performance instruction alignment unit - Google Patents
Superscalar microprocessor including a high performance instruction alignment unit Download PDFInfo
- Publication number
- WO1997041509A1 WO1997041509A1 PCT/US1996/006164 US9606164W WO9741509A1 WO 1997041509 A1 WO1997041509 A1 WO 1997041509A1 US 9606164 W US9606164 W US 9606164W WO 9741509 A1 WO9741509 A1 WO 9741509A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- instruction
- byte
- recited
- decode
- line
- Prior art date
Links
- 239000000872 buffer Substances 0.000 claims description 36
- 238000000034 method Methods 0.000 claims description 6
- 238000006073 displacement reaction Methods 0.000 description 21
- 238000010586 diagram Methods 0.000 description 7
- 230000001419 dependent effect Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 230000005465 channeling Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 101100332287 Dictyostelium discoideum dst2 gene Proteins 0.000 description 1
- 101100332288 Dictyostelium discoideum dst3 gene Proteins 0.000 description 1
- 101100278585 Dictyostelium discoideum dst4 gene Proteins 0.000 description 1
- 101100264226 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) XRN1 gene Proteins 0.000 description 1
- 101150090341 dst1 gene Proteins 0.000 description 1
- 150000002500 ions Chemical class 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- UPBAOYRENQEPJO-UHFFFAOYSA-N n-[5-[[5-[(3-amino-3-iminopropyl)carbamoyl]-1-methylpyrrol-3-yl]carbamoyl]-1-methylpyrrol-3-yl]-4-formamido-1-methylpyrrole-2-carboxamide Chemical compound CN1C=C(NC=O)C=C1C(=O)NC1=CN(C)C(C(=O)NC2=CN(C)C(C(=O)NCCC(N)=N)=C2)=C1 UPBAOYRENQEPJO-UHFFFAOYSA-N 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3802—Instruction prefetching
- G06F9/3816—Instruction alignment, e.g. cache line crossing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30145—Instruction analysis, e.g. decoding, instruction word fields
- G06F9/30149—Instruction analysis, e.g. decoding, instruction word fields of variable length instructions
- G06F9/30152—Determining start or end of instruction; determining instruction length
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3802—Instruction prefetching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3818—Decoding for concurrent execution
- G06F9/382—Pipelined decoding, e.g. using predecoding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3838—Dependency mechanisms, e.g. register scoreboarding
- G06F9/384—Register renaming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3854—Instruction completion, e.g. retiring, committing or graduating
- G06F9/3856—Reordering of instructions, e.g. using queues or age tags
Definitions
- This invention relates to superscalar microprocessors and more particularly to the alignment and dispatch of variable byte length computer instructions to a plurality of instruction decoders within a high performance and high frequency superscalar microprocessor.
- Superscalar microprocessors are capable of attaining performance characteristics which surpass those of conventional scalar processors by allowing the concurrent execution of multiple instructions. Due to the widespread acceptance of the x86 family of microprocessors, efforts have been undertaken by microprocessor manufacturers to develop superscalar microprocessors which execute x86 instructions. Such superscalar microprocessors achieve relatively high performance characteristics while advantageously maintaining backwards compatibility with the vast amount of existing software developed for previous microprocessor generations such as the 8086, 80286, 80386, and 80486.
- the x86 instruction set is relatively complex and is characterized by a plurality of variable byte length instructions.
- a generic format illustrative of the x86 instruction set is shown in Figure 1. As illustrated in the figure, an x86 instruction consists of from one to five optional prefix bytes 102, followed by an operation code (opcode) field 104, an optional addressing mode (Mod R/M) byte 106, an optional scale-index-base (SIB) byte 108, an optional displacement field 1 10, and an optional immediate data field 1 12.
- opcode operation code
- MOD R/M optional addressing mode
- SIB scale-index-base
- the opcode field 104 defines the basic operation for a particular instruction.
- the default operation of a particular opcode may be modified by one or more prefix bytes.
- a prefix byte may be used to change the address or operand size for an instruction, to override the default segment used in memory addressing, or to instruct the processor to repeat a string operation a number of times.
- the opcode field 104 follows the prefix bytes 102, if any, and may be one or two bytes in length.
- the addressing mode (Mod R/M) byte 106 specifies the registers used as well as memory addressing modes.
- the scale-index-base (SIB) byte 108 is used only in 32-bit base- relative addressing using scale and index factors.
- a base field ofthe SIB byte specifies which register contains the base value for the address calculation, and an index field specifies which register contains the index value.
- a scale field specifies the power of two by which the index value will be multiplied before being added, along with any displacement, to the base value.
- the next instruction field is the optional displacement field 1 10, which may be from one to four bytes in length.
- the displacement field 1 10 contains a constant used in address calculations.
- the optional immediate field 112 which may also be from one to four bytes in length, contains a constant used as an instruction operand.
- the shortest x86 instructions are only one byte long, and comprise a single opcode byte.
- the 80286 sets a maximum length for an instruction at 10 bytes, while the 80386 and 80486 both allow instruction lengths of up to 15 bytes.
- the complexity ofthe x86 instruction set poses difficulties in implementing high performance x86 compatible superscalar microprocessors.
- One difficulty arises from the fact that instructions must be aligned with respect to the parallel-coupled instruction decoders of such processors before proper decode can be effectuated.
- the x86 instruction set consists of variable byte length instructions, the start bytes of successive instructions within a line are not necessarily equally spaced, and the number of instructions per line is not fixed. As a result, employment of simple, fixed-length shifting logic cannot in itself solve the problem of instruction alignment.
- an instruction alignment unit is provided which is capable of routing variable byte length instructions such as x86 instructions simultaneously to a plurality of decode units which form fixed issue positions within the superscalar microprocessor.
- the instruction alignment unit may be implemented 25 with a relatively small number of cascaded levels of logic gates, thus accommodating very high frequencies of operation.
- a superscalar microprocessor includes an instruction cache for storing a plurality of variable byte-length instructions and a 30 predecode unit for generating predecode tags which identify the location of the start byte of each variable byte-length instruction.
- An instruction alignment unit is configured to channel a plurality ofthe variable byte-length instructions simultaneously to predetermined issue positions depending upon the locations of their corresponding start bytes in a cache line. The issue position or positions to which an instruction may be dispatched is limited depending upon the position ofthe instruction's start byte within a line. By limiting the number of issue positions to which a given instruction of a line may be dispatched, the number of cascaded levels of logic required to implement the instruction alignment unit may be advantageously reduced.
- instructions that have start bytes located at certain positions within a cache line may be restricted for dispatch to only one issue position. while instructions having start bytes at other positions within the cache line may be dispatched to one of a plurality of possible issue positions. By restricting the dispatch of those instructions having start bytes residing at certain positions within a line to a single issue position, the number of cascaded levels of logic may be reduced even further.
- the invention contemplates a superscalar microprocessor comprising an instruction cache for storing a plurality of variable byte-length instructions, a predecode unit coupled to the instruction cache for generating a predecode tag associated with each variable byte-length instruction, and a plurality of decode units capable of decoding the variable byte length instructions, wherein each ofthe plurality of decode units is associated with a fixed issue position.
- An instruction alignment unit is also coupled between the instruction cache and the plurality of decode units, wherein the instruction alignment unit is configured to channel the plurality of variable byte-length instructions to predetermined issue positions depending upon the predecode tag associated with each variable byte-length instruction.
- the invention further contemplates a superscalar microprocessor comprising an instruction cache for storing a plurality of variable byte-length instructions, a predecode unit coupled to the instruction cache for generating a predecode tag associated with each variable byte-length instruction, and a plurality of decode units capable of decoding the variable byte length instructions, wherein each of the plurality of decode units is associated with a fixed issue position.
- An instruction alignment unit is further coupled between the instruction cache and the plurality of decode units, wherein the instruction alignment unit is configured to channel a first instruction starting within a first predetermined range of positions within a cache line to a first decode unit and to channel a second instruction starting within a second range of positions within the cache line to a second decode unit.
- the invention additionally contemplates a method for aligning instructions within a superscalar microprocessor comprising the steps of storing a plurality of variable byte-length instructions within an instruction cache, predecoding the plurality of variable byte-length instructions to thereby provide a tag indicative of a boundary of each ofthe plurality ofthe variable byte-length instructions, and detecting predecode tags associated with a line of instructions within the instruction cache.
- the method comprises the further steps of routing a first instruction starting within a first range of positions within a cache line to a first decode unit, and routing a second instruction starting within a second range of positions within the cache line to a second decode unit.
- Figure 1 is a diagram which illustrates the generic x86 instruction set format.
- Figure 2 is a block diagram of a superscalar microprocessor which includes an instruction alignment unit to forward multiple instructions to six decode units.
- Figure 3 is a block diagram ofthe instruction alignment unit and six decode units.
- FIGS 4A-4C are block diagrams which depict execution of an MROM instruction. While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that the drawings and detailed description thereto are not intended to limit the invention to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope ofthe present invention as defined by the appended claims.
- superscalar microprocessor 200 includes a prefetch/predecode unit 202 and a branch prediction unit 220 coupled to an instruction cache 204.
- Instruction alignment unit 206 is coupled between instruction cache 204 and a plurality of decode units 208A-208F (referred to collectively as decode units 208).
- decode units 208A-208F are coupled to a respective reservation station units 210A-210F (referred collectively as reservation stations 210), and each reservation station 210A-210F is coupled to a respective functional unit 212A-212F (referred to collectively as functional units 212).
- Decode units 208, reservation stations 210, and functional units 212 are further coupled to a reorder buffer 216, a register file 218 and a load/store unit 222.
- a data cache 224 is finally shown coupled to load/store unit 222, and an MROM unit 209 is shown coupled to instruction alignment unit 206.
- instruction cache 204 is a high speed cache memory provided to temporarily store instructions prior to their dispatch to decode units 208.
- instruction cache 204 is configured to cache up to 32 kilobytes of instruction code organized in lines of 16 bytes each (where each byte consists of 8 bits).
- instruction code is provided to instruction cache 204 by prefetching code from a main memory (not shown) through prefetch/precode unit 202. It is noted that instruction cache 204 could be implemented in a set-associative, a fully-associative, or a direct-mapped configuration.
- Prefetch/predecode unit 202 is provided to prefetch instruction code from the main memory for storage within instruction cache 204.
- prefetch/predecode unit 202 is configured to burst 64-bit wide code from the main memory into instruction cache 204. It is understood that a variety of specific code prefetching techniques and algorithms may be employed by prefetch/predecode unit 202.
- prefetch/predecode unit 202 fetches instructions from the main memory, it generates three predecode bits associated with each byte of instruction code: a start bit, an end bit, and a "functional" bit.
- the predecode bits form tags indicative of the boundaries of each instruction.
- the predecode tags may also convey additional information such as whether a given instruction can be decoded directly by decode units 208 or whether the instruction must be executed by invoking a microcode procedure controlled by MROM unit 209, as will be described in greater detail below.
- Table 1 indicates one encoding ofthe predecode tags. As indicated within the table, if a given byte is the first byte of an instruction, the start bit for that byte is set. If the byte is the last byte of an instruction, the end bit for that byte is set. If a particular instruction cannot be directly decoded by the decode units 208, the functional bit associated with the first byte ofthe instruction is set. On the other hand, if the instruction can be directly decoded by the decode units 208, the functional bit associated with the first byte ofthe instruction is cleared. The functional bit for the second byte of a particular instruction is cleared if the opcode is the first byte, and is set if the opcode is the second byte.
- the first byte is a prefix byte.
- the functional bit values for instruction byte numbers 3-8 indicate whether the byte is a MODRM or an SIB byte, as well as whether the byte contains displacement or immediate data.
- certain instructions within the x86 instruction set may be directly decoded by decode unit 208. These instructions are referred to as “fast path” instructions.
- the remaining instructions ofthe x86 instruction set are referred to as "MROM instructions”.
- MROM instructions are executed by invoking MROM unit 209. When an MROM instruction is encountered, MROM unit 209 parses and serializes the instruction into a subset of defined fast path instructions to effectuate a desired operation.
- a listing of exemplary x86 instructions categorized as fast path instructions as well as a description of the manner of handling both fast path and MROM instructions will be provided further below.
- Instruction alignment unit 206 is provided to channel or "funnel" variable byte length instructions from instruction cache 204 to fixed issue positions formed by decode units 208A-208F. As will be described in conjunction with Figures 3 and 4A- 4C, instruction alignment unit 206 is configured to channel instruction code to designated decode units 208A-208F depending upon the locations of the start bytes of instructions within a line as delineated by instruction cache 204. In one embodiment, the particular decode unit 208A-208F to which a given instruction may be dispatched is dependent upon both the location of the start byte of that instruction as well as the location of the previous instruction's start byte, if any. Instructions starting at certain byte locations may further be restricted for issue to only one predetermined issue position. Specific details follow.
- each ofthe decode units 208 includes decoding circuitry for decoding the predetermined fast path instructions referred to above.
- each decode unit 208A-208F routes displacement and immediate data to a corresponding reservation station unit 21 OA-21 OF.
- Output signals from the decode units 208 include bit-encoded execution instructions for the functional units 212 as well as operand address information, immediate data and/or displacement data.
- the superscalar microprocessor of Figure 2 supports out of order execution, and thus employs reorder buffer 216 to keep track of the original program sequence for register read and write operations, to implement register renaming, to allow for speculative instruction execution and branch misprediction recovery, and to facilitate precise exceptions.
- a temporary storage location within reorder buffer 216 is reserved upon decode of an instruction that involves the update of a register to thereby store speculative register states.
- Reorder buffer 216 may be implemented in a first-in-first-out configuration wherein speculative results move to the "bottom" ofthe buffer as they are validated and written to the register file, thus making room for new entries at the "top" ofthe buffer.
- each reservation station unit 21 OA-21 OF is capable of holding instruction information (i.e., bit encoded execution bits as well as operand values, operand tags and/or immediate data) for up to three pending instructions awaiting issue to the corresponding functional unit.
- each decode unit 208A-208F is associated with a dedicated reservation station unit 21 OA-21 OF, and that each reservation station unit 21 OA-21 OF is similarly associated with a dedicated functional unit 212A-212F. Accordingly, six dedicated "issue positions" are formed by decode units 208, reservation station units 210 and functional units 212. Instructions aligned and dispatched to issue position 0 through decode unit 208A are passed to reservation station unit 210A and subsequently to functional unit 212A for execution. Similarly, instructions aligned and dispatched to decode unit 208B are passed to reservation station unit 21 OB and into functional unit 212B, and so on.
- register address information is routed to reorder buffer 216 and register file 218 simultaneously.
- the x86 register file includes eight 32 bit real registers (i.e., typically referred to as EAX, EBX. ECX, EDX, EBP, ESI. EDI and ESP), as will be described further below.
- Reorder buffer 216 contains temporary storage locations for results which change the contents of these registers to thereby allow out of order execution. A temporary storage location of reorder buffer 216 is reserved for each instruction which, upon decode, modifies the contents of one ofthe real registers.
- reorder buffer 216 may have one or more locations which contain the speculatively executed contents of a given register. If following decode of a given instruction it is determined that reorder buffer 216 has previous location(s) assigned to a register used as an operand in the given instruction, the reorder buffer 216 forwards to the corresponding reservation station either: 1) the value in the most recently assigned location, or 2) a tag for the most recently assigned location if the value has not yet been produced by the functional unit that will eventually execute the previous instruction. If the reorder buffer has a location reserved for a given register, the operand value (or tag) is provided from reorder buffer 216 rather than from register file 218. If there is no location reserved for a required register in reorder buffer 216. the value is taken directly from register file 218. If the operand corresponds to a memory location, the operand value is provided to the reservation station unit through load/store unit 222.
- Reservation station units 21 OA-21 OF are provided to temporarily store instruction information to be speculatively executed by the corresponding functional units 212A-212F. As stated previously, each reservation station unit 21 OA-21 OF may store instruction information for up to three pending instructions. Each ofthe six reservation stations 210A-210F contain locations to store bit-encoded execution instructions to be speculatively executed by the corresponding functional unit and the values of operands. If a particular operand is not available, a tag for that operand is provided from reorder buffer 216 and is stored within the corresponding reservation station until the result has been generated (i.e., by completion ofthe execution of a previous instruction).
- Reorder buffer 216 ensures that data coherency is maintained in situations where read-after-write dependencies occur.
- each ofthe functional units 212 is configured to perform integer arithmetic operations of addition and subtraction, as well as shifts, rotates, logical operations, and branch operations. It is noted that a floating point unit (not shown) may also be employed to accommodate floating point operations.
- Each ofthe functional units 212 also provides information regarding the execution of conditional branch instructions to the branch prediction unit 220. If a branch prediction was incorrect, instruction cache 204 flushes instructions not needed, and causes prefetch/predecode unit 202 to fetch the required instructions from main memory. It is noted that in such situations, results of instructions in the original program sequence which occur after the mispredicted branch instruction are discarded, including those which were speculatively executed and temporarily stored in load/store unit 222 and reorder buffer 216. Exemplary configurations of suitable branch prediction mechanisms are well known.
- Results produced by functional units 212 are sent to the reorder buffer 216 if a register value is being updated, and to the load/store unit 222 if the contents of a memory location is changed. If the result is to be stored in a register, the reorder buffer 216 stores the result in the location reserved for the value ofthe register when the instruction was decoded. As stated previously, results are also broadcast to reservation station units 21 OA-21 OF where pending instructions may be waiting for the results of previous instruction executions to obtain the required operand values.
- load/store unit 222 provides an interface between functional units 212A-212F and data cache 224.
- load/store unit 222 is configured with a store buffer with eight storage locations for data and address information for pending loads or stores.
- Functional units 212 arbitrate for access to the load/store unit 222. When the buffer is full, a functional unit must wait until the load/store unit 222 has room for the pending load or store request information.
- the load/store unit 222 also performs dependency checking for load instructions against pending store instructions to ensure that data coherency is maintained.
- Data cache 224 is a high speed cache memory provided to temporarily store data being transferred between load/store unit 222 and the main memory subsystem.
- data cache 224 has a capacity of storing up to eight kilobytes of data. It is understood that data cache 224 may be implemented in a variety of specific memory configurations, including a set associative configuration.
- Figure 3 is a block diagram which depicts internal portions of one embodiment of instruction alignment unit 206 as well as internal portions of decode units 208A-208F with respect to a line of instruction code to be provided from instruction cache 204.
- instruction alignment unit 206 is configured to channel variable byte length instructions (in this case certain x86 instructions referred to as fast path instructions) to decode units 208A-208F.
- a latching unit 302 is incorporated as a portion of an output buffer section 301 of instruction cache 204.
- Latching unit 302 is capable of storing a line of instruction code provided from a storage array (not shown in Figure 3) of instruction cache 204 prior to being dispatched to decode units 208.
- the instruction alignment unit 206 of Figure 3 includes a plurality of multiplexer circuits referred to as multiplexer channels 304A-304G coupled between latching unit 302 and decode units 208.
- a multiplexer control circuit 306 is further shown coupled to each multiplexer channel 304A-304G.
- each decode unit 208A-208F includes an associated instruction decoder 318A-318F having an input port coupled to a respective multiplexer channel 304A-304F.
- Each decode unit 208A-208F further includes a respective displacement/immediate data buffer 330A-330F and a respective instruction issue unit 340A-340F.
- a line of instruction code to be executed is provided to latching unit 302 from the storage array of instruction cache 204.
- Each byte of instruction code within instruction cache 204 is associated with a corresponding 5 predecode tag including a start bit, an end bit, and a functional bit.
- the predecode tag associated with each byte is provided to an input of multiplexer control circuit 306.
- multiplexer control circuit 306 l o controls multiplexer channels 304A-304G such that the instruction bytes are selectively routed to designated instruction decoders 318A-318F. Instruction paths formed by decode units 208A-208F are referred to as issue positions.
- the channeling of instruction code through multiplexer channels 304A-304G is dependent upon the location ofthe start byte associated with each instruction relative to each line as
- each ofthe first five multiplexer channels 304A-304F routes four contiguous bytes of instruction code from latching unit 302 to a respective instruction decoder 318A-318F.
- Multiplexer channel 304G is capable of channeling up to three contiguous bytes of instruction code to instruction decoder 318.
- Table 2 below illustrates the possible multiplexer channels 304A-304G through which start bytes may be channeled. As stated previously, the channeling of instruction code is dependent upon the location(s) of start bytes within a given line. It is noted that each multiplexer channel 304A-304F is configured to route the lowest- 25 order start byte among those allocated to it, provided the start byte has not been selected for routing by a lower order multiplexer channel. Tabl e 2-.. Tli gpat- r-he.g LO. I SSUft Pos it ions
- multiplexer channel 304A is capable of routing start bytes located at byte positions 0-2 to decode unit 318 A.
- Multiplexer channel 304B is capable of routing start bytes at byte positions 1 -4 to decode unit 318B.
- Multiplexer channel 304C is capable of transferring start bytes at byte positions 3-8 to decode unit 208C.
- multiplexer channel 304D is capable of transferring start bytes at byte positions 6-10 to decode unit 208D
- multiplexer channel 304E is capable of transferring start bytes at byte positions 9-12 to decode unit 208E.
- multiplexer channel 304F is capable of transferring start bytes at byte positions 12-15 to decode unit 318F.
- Start bytes located at byte positions 13-15 may alternatively be routed through multiplexer channel 304G to a seventh issue position which is employed to wrap bytes of an incomplete instruction (i.e., an instruction which extends into the next line) to the next cache line for decode.
- instruction bytes routed through multiplexer channel 304G are provided to instruction decoder 304A upon the next clock cycle when the remaining bytes of that instruction are available within latching unit 302.
- the dispatch of the instruction to a designated position is dependent upon the nature of the remaining bytes ofthe instruction that appear on the next line. For situations where solely displacement or immediate data wrap around to the next cache line, that immediate or displacement data is provided to displacement/immediate data buffer 330F through multiplexer channel 304A. It is noted that in this situation, the preceding bytes of that instruction (which appear on the preceding cache line) will have been dispatched to instruction decoder 318F during the preceding clock cycle.
- the instruction information from the previous line is routed through multiplexer channel 304G to instruction decoder 318 A, and is merged with the rest of the instruction code during the next clock cycle.
- the number of cascaded levels of logic required to implement the instruction alignment unit 206 may be advantageously reduced. Furthermore, by restricting the dispatch of an instruction having a start byte which resides at one of a select subset of byte locations within a line to a single issue position (i.e., byte positions 5 and 1 1), the number of cascaded levels of logic for instruction alignment may be reduced even further. Accordingly, the instruction alignment unit 206 as described above allows the implementation of a superscalar microprocessor having a relatively small number of gates per pipeline stage to thereby accommodate very high frequencies of operation. For relatively long instructions, although issue positions may be skipped, relatively high performance may still be achieved since other issue positions are available for remaining instructions within a cache line.
- the defined fast path instructions may be up to eight bytes in length, and may include a single prefix byte. It is noted that by limiting the defined fast path instructions to only a single prefix byte, 4 through 7 of any fast path instruction contain only displacement or immediate data. Therefore, for situations in which the instruction is greater than four bytes, the first four bytes ofthe instruction are routed through the multiplexer channel allocated to that instruction's start byte. The remaining bytes ofthe instruction are channeled through the next issue position's multiplexer channel.
- the instruction decoder ofthe issue position (i.e., instruction decoder) receiving the remaining bytes of the instruction detects the absence of a start bit at its first-byte position, and accordingly passes the data to the displacement/immediate data buffer 330 of the preceding issue position and issues a NOOP instruction.
- each instruction decoder 318A-318F is capable of decoding only one instruction at a time. Accordingly, although the start bytes of more than one instruction may be provided to, for example, instruction decoder 318A, only the first instruction is decoded.
- multiplexer channels 304 of instruction alignment unit 206 could be alternatively configured such that only a single instruction (or portions thereof), in accordance with the instruction's start and end predecode bits, are channeled to a given instruction decoder 318.
- multiplexer channel 304G routes the preceding portions ofthe instruction to instruction decoder 318 A, in which case the next instruction (corresponding to the first start byte within latching unit 302 during the next clock cycle) will be routed through multiplexer channel 304B to instruction decoder 318B.
- a sample sequence of x86 instructions is shown in Table 3 below. Instructions 1 through 7 in addition to the first byte of instruction 8 are shown within cache line 1. Cache line 2 begins with the second byte of instruction 8, and further includes instructions 9 through 16. Table..L-. Sampl e Sequence o£ Instructions.
- Table 4 illustrates the manner in which the above sequence of instructions in Table 3 are dispatched to the decode units 208A-208F by instruction alignment unit 206.
- Instructions 1-5 are dispatched to issue positions 0-4 corresponding to decode units 318A-318E, respectively, during a first clock cycle.
- NOOP no operation
- multiplexer control circuit 306 causes decode units 318A-318D to issue NOOP instructions. Since instruction 8 wraps around to the next cache line, the first byte ofthe instruction is wrapped around to instruction decoder 318 during the next clock cycle through multiplexer channel 304G.
- instruction 8 is dispatched to issue position 0. It is noted that the first byte of instruction 8 is wrapped around from byte position 15 ofthe previous line. Instructions 9 and 10 are further dispatched to issue positions 1 and 2 through multiplexer channels 304B and 304C, respectively. Upon decode of instructions 8-10, instruction issue units 340D-E cause NOOP instructions to be issued.
- Instructions 1 1 and 12 are dispatched to issue positions 2 and 3 during clock cycle 4. Instruction 13 begins in byte 7, and cannot be routed to issue position 4. Therefore, the dispatch of instruction 13 must be held until the next clock cycle.
- instructions 13 through 16 are dispatched to issue positions 2 through 5, respectively. Similar to the above, during decode of instructions 13-16, instruction issue units 340A and 340B cause NOOP instructions to be issued for issue positions 0 and 1.
- MROM unit 209 parses instructions into a series of fast path instructions which are dispatched during one or more clock cycles.
- MROM unit 209 parses instructions into a series of fast path instructions which are dispatched during one or more clock cycles.
- an MROM instruction within a line of code in latching unit 202 is detected by MROM unit 209, this instruction and any following it are not dispatched during the current cycle. Any instruction(s) preceding it are dispatched in accordance with the above description.
- MROM unit 209 provides series of fast path instructions to the decode units 208 through instruction alignment unit 206 in accordance with the microcode for that particular MROM instruction. Once all of the microcoded instructions have been dispatched to decode units 208 through alignment unit 206 to effectuate the desired MROM operation, the instructions which followed the MROM instruction are allowed to be dispatched.
- Table 5 illustrates a sample of x86 assembly language code segment containing an MROM instruction (REP MOVSB).
- Figures 4A-4C are block diagrams of portions of superscalar processor 200 depicting the dispatch and decode ofthe instructions of Table 5 during consecutive clock cycles.
- the first two instructions (MOVE CX. S_LEN and CLD) are routed through multiplexer channels 304A and 304B to issue positions 0 and 1 (i.e., decode units 318A and 318B).
- decode MROM unit 209 Upon decode MROM unit 209 further causes decode units 208C-208F to issue NOOP instructions.
- Microcoded instructions that effectuate the REP MOVSB instruction are dispatched during cycles 2 through N, as depicted by Figure 4B. During these cycles. 4150
- MROM unit 209 a set of fast path instructions in accordance with the microcode stored in MROM unit 209 are dispatched through the instruction alignment unit 206 to decode units 208A- 208F. It is noted that this MROM sequence may take several cycles to complete.
- MROM unit 209 causes decode units 208A-208C issue NOOP instructions.
- variable byte-length computer instructions may be dispatched to a plurality of instruction decoders during the same pipeline stage.
- the instruction alignment unit may be implemented using a relatively small number of cascaded levels of logic gates to thereby accommodate high frequencies of operation.
- instruction alignment unit 206 as described above in conjunction with Figures 2-4 is configured to selectively route instructions to the specific issue positions indicated by Table 2, other configurations are also possible. That is, the specific issue position or positions to which a given instruction within a line of memory is dispatched may be varied from that described above. It is further specifically contemplated that the number of issue positions provided within a superscalar microprocessor employing an instruction alignment unit in accordance with the invention may also vary.
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Advance Control (AREA)
Abstract
Description
Claims
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP96915461A EP0896700A1 (en) | 1996-05-01 | 1996-05-01 | Superscalar microprocessor including a high performance instruction alignment unit |
PCT/US1996/006164 WO1997041509A1 (en) | 1996-05-01 | 1996-05-01 | Superscalar microprocessor including a high performance instruction alignment unit |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/US1996/006164 WO1997041509A1 (en) | 1996-05-01 | 1996-05-01 | Superscalar microprocessor including a high performance instruction alignment unit |
Publications (1)
Publication Number | Publication Date |
---|---|
WO1997041509A1 true WO1997041509A1 (en) | 1997-11-06 |
Family
ID=22255029
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US1996/006164 WO1997041509A1 (en) | 1996-05-01 | 1996-05-01 | Superscalar microprocessor including a high performance instruction alignment unit |
Country Status (2)
Country | Link |
---|---|
EP (1) | EP0896700A1 (en) |
WO (1) | WO1997041509A1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11204768B2 (en) | 2019-11-06 | 2021-12-21 | Onnivation Llc | Instruction length based parallel instruction demarcator |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0467152A2 (en) * | 1990-07-20 | 1992-01-22 | Hitachi, Ltd. | Microprocessor capable of decoding two instructions in parallel |
EP0498654A2 (en) * | 1991-02-08 | 1992-08-12 | Fujitsu Limited | Cache memory processing instruction data and data processor including the same |
GB2263987A (en) * | 1992-02-06 | 1993-08-11 | Intel Corp | End bit markers for instruction decode. |
EP0651322A1 (en) * | 1993-10-29 | 1995-05-03 | Advanced Micro Devices, Inc. | Instruction caches for variable byte-length instructions |
EP0685788A1 (en) * | 1994-06-01 | 1995-12-06 | Advanced Micro Devices, Inc. | Programme counter update mechanism |
-
1996
- 1996-05-01 WO PCT/US1996/006164 patent/WO1997041509A1/en not_active Application Discontinuation
- 1996-05-01 EP EP96915461A patent/EP0896700A1/en not_active Withdrawn
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0467152A2 (en) * | 1990-07-20 | 1992-01-22 | Hitachi, Ltd. | Microprocessor capable of decoding two instructions in parallel |
EP0498654A2 (en) * | 1991-02-08 | 1992-08-12 | Fujitsu Limited | Cache memory processing instruction data and data processor including the same |
GB2263987A (en) * | 1992-02-06 | 1993-08-11 | Intel Corp | End bit markers for instruction decode. |
EP0651322A1 (en) * | 1993-10-29 | 1995-05-03 | Advanced Micro Devices, Inc. | Instruction caches for variable byte-length instructions |
EP0685788A1 (en) * | 1994-06-01 | 1995-12-06 | Advanced Micro Devices, Inc. | Programme counter update mechanism |
Also Published As
Publication number | Publication date |
---|---|
EP0896700A1 (en) | 1999-02-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5758114A (en) | High speed instruction alignment unit for aligning variable byte-length instructions according to predecode information in a superscalar microprocessor | |
JP3794917B2 (en) | Branch selectors associated with byte ranges in the instruction cache to quickly identify branch predictions | |
US6009512A (en) | Mechanism for forwarding operands based on predicated instructions | |
US6049863A (en) | Predecoding technique for indicating locations of opcode bytes in variable byte-length instructions within a superscalar microprocessor | |
US5850532A (en) | Invalid instruction scan unit for detecting invalid predecode data corresponding to instructions being fetched | |
US20060174089A1 (en) | Method and apparatus for embedding wide instruction words in a fixed-length instruction set architecture | |
JP3803723B2 (en) | Branch prediction mechanism that employs a branch selector that selects branch prediction | |
US5968163A (en) | Microcode scan unit for scanning microcode instructions using predecode data | |
US5987235A (en) | Method and apparatus for predecoding variable byte length instructions for fast scanning of instructions | |
US5872947A (en) | Instruction classification circuit configured to classify instructions into a plurality of instruction types prior to decoding said instructions | |
KR100603067B1 (en) | Branch prediction with return selection bits to categorize type of branch prediction | |
US5835744A (en) | Microprocessor configured to swap operands in order to minimize dependency checking logic | |
US5852727A (en) | Instruction scanning unit for locating instructions via parallel scanning of start and end byte information | |
US5991869A (en) | Superscalar microprocessor including a high speed instruction alignment unit | |
US6175908B1 (en) | Variable byte-length instructions using state of function bit of second byte of plurality of instructions bytes as indicative of whether first byte is a prefix byte | |
US5940602A (en) | Method and apparatus for predecoding variable byte length instructions for scanning of a number of RISC operations | |
JP3732233B2 (en) | Method and apparatus for predecoding variable byte length instructions in a superscalar microprocessor | |
US5898851A (en) | Method and apparatus for five bit predecoding variable length instructions for scanning of a number of RISC operations | |
WO1997041509A1 (en) | Superscalar microprocessor including a high performance instruction alignment unit | |
JP3717524B2 (en) | Load / store unit with multiple pointers for completing store and load miss instructions | |
EP0912925B1 (en) | A return stack structure and a superscalar microprocessor employing same | |
EP0919025B1 (en) | A parallel and scalable instruction scanning unit | |
KR100448676B1 (en) | Method and apparatus for predecoding variable byte-length instructions within a superscalar microprocessor | |
US6141745A (en) | Functional bit identifying a prefix byte via a particular state regardless of type of instruction | |
WO1998002798A1 (en) | A superscalar microprocesser including a high speed instruction alignment unit |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): CA CN JP KR |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE |
|
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 1996915461 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 1996915461 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: JP Ref document number: 97538834 Format of ref document f/p: F |
|
NENP | Non-entry into the national phase |
Ref country code: CA |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 1996915461 Country of ref document: EP |