WO2003050723A1 - An interface for integrating reconfigurable processors into a general purpose computing system - Google Patents
An interface for integrating reconfigurable processors into a general purpose computing system Download PDFInfo
- Publication number
- WO2003050723A1 WO2003050723A1 PCT/US2002/010813 US0210813W WO03050723A1 WO 2003050723 A1 WO2003050723 A1 WO 2003050723A1 US 0210813 W US0210813 W US 0210813W WO 03050723 A1 WO03050723 A1 WO 03050723A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- command
- data
- memory
- command list
- execution
- Prior art date
Links
- 238000000034 method Methods 0.000 claims abstract description 20
- 238000013519 translation Methods 0.000 claims abstract description 12
- 239000000872 buffer Substances 0.000 claims abstract description 9
- 238000012545 processing Methods 0.000 claims description 30
- 238000012546 transfer Methods 0.000 claims description 9
- 230000003993 interaction Effects 0.000 claims description 5
- 230000005055 memory storage Effects 0.000 claims 2
- 238000009448 modified atmosphere packaging Methods 0.000 description 88
- 235000019837 monoammonium phosphate Nutrition 0.000 description 11
- 238000012360 testing method Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 6
- 230000008901 benefit Effects 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000004590 computer program Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004883 computer application Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7867—Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/24—Resetting means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3877—Concurrent instruction execution, e.g. pipeline or look ahead using a slave processor, e.g. coprocessor
- G06F9/3879—Concurrent instruction execution, e.g. pipeline or look ahead using a slave processor, e.g. coprocessor for non-native instruction execution, e.g. executing a command; for Java instruction set
Definitions
- the present invention relates to a method and system for interfacing computer processors.
- the present invention relates to a computer system interface that provides for the integration of reconfigurable processors with instruction processors and other reconfigurable processors.
- Typical computing systems generally include traditional instruction processors for processing and controlling the instructions and data flow of a typical computer program.
- Computer instructions can also be implemented with hardware logic.
- Hardware logic-implemented functions can greatly accelerate the processing of application algorithms over those same algorithms implemented in software. Sections of code such as compute-intensive algorithms, and/or repetitious instructions can benefit from hardware implementation.
- computer instructions implemented in hardware logic run much faster than those implemented in software, hardware logic is much more expensive to design, develop, and manufacture than an average computer application.
- ⁇ WCS - 80404/0014 - 51813 vl - the computer code is divided into sections, generally during compilation. Depending on optimization requirements these sections of code can then be processed by one or more instruction processors, and/or one or more reconfigurable processors. Performance enhancements resulting from reconfigurable computing can provide orders of magnitude improvements in speed for a wide variety of computing code. However, the coordination between the instruction processors, reconfigurable processors, main memory, and various other components can degrade these performance improvements. Additionally, the increased number of components necessary to coordinate such a hybrid computer system can add significantly to the overall cost.
- the present invention is directed to an interface for integrating reconfigurable processors with standard instruction processors and/or other reconfigurable processors into a computer system that substantially obviates one or more of the problems due to limitations and disadvantages of the related art.
- the present invention includes an interface control processor, storage for the interface instructions, data registers, flag registers, user registers, and a direct memory access (“DMA”) processor.
- interface control processor storage for the interface instructions
- data registers storage for the interface instructions
- flag registers flag registers
- user registers storage for the interface instructions
- DMA direct memory access
- the reconfigurable processor interface is an active interface.
- the interface takes direction from instruction processors, reconfigurable processors, as well as the user logic within a reconfigurable processor, yet is capable of control and decision making on its own.
- This active control is defined through a coordinated effort of the various interface registers, specific areas of common memory, the dedicated reconfigurable processor memory, and the user logic of the reconfigurable processor with an interface control program.
- An object of the present invention is to provide a mechanism by which the instruction processors communicate and coordinate with arbitrary user logic
- Another object of the present invention is to provide an active control that allows instruction processors and reconfigurable processors to function autonomously. Another object of the present invention is to provide a means for moving memory contents between common memory and the dedicated memory of a reconfigurable processor.
- a further object of the present invention is to provide a flexible interface that can be defined for arbitrary instruction processor instruction sequences and arbitrary reconfigurable user logic.
- Another object of the present invention is to ensure the protected operation of the interface by ensuring that the memory to be referred to is only within the boundaries of a user program.
- Another object of the present invention is to provide interaction with a user program without requiring operating system services.
- Yet a further object of the present invention is to provide a means for the reconfigurable processor to interact with input/output services.
- Another object of the present invention is to divide application programs among instruction processors and reconfigurable processors to achieve optimum performance.
- Another object of the present invention is to allow a single-system image to be constructed for a hybrid system of instruction processors and reconfigurable processors.
- Another object of the present invention is to allow a single-system image to be constructed for a system of purely reconfigurable processors.
- a further object of the present invention is to allow the construction of a cluster of Symmetric Multi-Processor ("SMP") hybrid nodes or strictly instruction nodes, or reconfigurable processor nodes, or various combinations thereof.
- SMP Symmetric Multi-Processor
- Still a further object of the present invention is to allow instruction processors and reconfigurable processors to be integrated into a single-system image SMP architecture system.
- a further object of the present invention is to allow reconfigurable processors to be integrated into a single-system image SMP.
- Another object of the present invention is to allow the user logic of the reconfigurable processor to coordinate with the rest of a single-system image SMP system.
- the interface for integrating reconfigurable processors into a general purpose computing system provides the ability to coordinate the execution of an application that combines processor code and synthesized logic.
- the present invention provides an interface for integrating reconfigurable processors into a general purpose computing system.
- the interface includes: (1 ) interface control processor for managing the interaction between the at least one reconfigurable processor and the computer system; (2) interface control processor instructions organized into command lists; (3) random access memory for storing the current command list; (4) registers for storing data and settings; (5) common memory; (6) direct memory access logic for coordinating the transfer of the reconfigurable processor instructions and data; (7) address translation buffer for storing translation data and translating virtual memory addresses to physical memory addresses; and (8) dedicated memory.
- the present invention provides a method for processing interface control processor instructions compiled from a user application.
- This method includes the steps of: (1 ) storing command lists in a dedicated area of common memory, (2) receiving a fetch command sent by the user application; (3) fetching a command list from the dedicated area; (4) loading the command list into a command list memory within the interface; (5) processing the command list through an interface control processor; (6) interacting with user logic to exchange data and control signals, and (7) determining if another command list is ready for processing, if so, processing the new command list; otherwise, waiting for another fetch command.
- FIGURE 1 is a simplified, high level, functional block diagram of a multiprocessor computer architecture, including reconfigurable processors.
- FIGURE 2 is a representational diagram of a reconfigurable processor used according to a preferred embodiment of the present invention.
- FIGURE 3 is a flow diagram of the method of halting the instruction processing of a reconfigurable processor.
- FIGURE 4 is a flow diagram of the method of restarting the execution of a reconfigurable processor after it has been halted.
- FIGURE 5 is a representational diagram of a common memory organization according to a preferred embodiment of the present invention.
- FIGURE 6 is a flow diagram of the method of processing interface control processor instructions.
- the present invention integrates a computer's instruction processors, common memory, and reconfigurable processors, referred to in the preferred embodiment as Multi-Adaptive Processors ("MAPs").
- MAPs Multi-Adaptive Processors
- FIG. 1 an overview of the computer system incorporating MAPs 12 is shown.
- the instruction processor boards 10 and MAPs 12 reside on the trunk 20 and 22 to the crossbar 16.
- the MAPs are connected via a multiplexer 14.
- the crossbar 16 is then connected to common memory 18 through memory trunk 26.
- Further embodiments of the present invention allow for placing the instruction processor 10 and/or the MAPs 12 in any other location, including, but not limited to, the memory trunk 26, crossbar switch 16, common memory inter-connect, or I/O bus 28.
- the instruction processors 100 and MAPs 12 may also be on the same or separate circuit boards, or even integrated on the same computer chip. Referring to Figure 2, the reconfigurable MAP 12 is described in more specific detail.
- a MAP 12 is a standalone processing unit executing independently of instruction processors during normal operation, as well as while loading and storing data.
- Each MAP 12 incorporates various hardware components of the preferred embodiment of the present invention, including for purposes of illustration only, an interface control processor, referred to as a
- ⁇ WCS - 80404/0014 - 51813 vl command processor 42 command list memory 43, a direct memory access (“DMA") controller 44, a translation look-aside buffer (“TLB”) 45, data registers 46, user data registers 48, flag registers 47, and on-board memory 50.
- DMA direct memory access
- TLB translation look-aside buffer
- Each MAP 12 also includes user-configured logic known as the user array 60.
- the user array of the present example contains two large Field
- FPGAs Programmable Gate Arrays
- U_Logic user logic
- the command processor 42 processes commands, makes control decisions, and manages the transfer of data.
- a ComList is also generated during compilation of the user application.
- a ComList is used for controlling a MAP 12 or MAPs12 ⁇ - 12 N ( Figure 1 ) during normal operation.
- the ComList contains a list of controlling instructions for the MAP'S command processor 42 and is initially stored in common memory 18 ( Figure 1 ) until it is fetched by the MAP'S controller 40 into the ComList memory 43.
- Common memory addresses specified by commands or generated by user applications are virtual addresses. Addresses are translated, virtual to physical, by TLB 45 entries with the DMA controller 44.
- Data registers 46 consist of thirty-two, 64-bit registers known as DR0 through DR31. These registers are generally used to hold addresses for both common memory 18 ( Figure 1 ) and on-board memory 50; however, they can hold any needed data.
- DR0 through DR31 the number of bits in the registers.
- scalar data can be sent to or received from the user array 60. Simple arithmetic operations, as well as bitwise logic, are supported on these registers. The data register contents can be tested for zero/non-zero; therefore, tests together with branches provide
- ⁇ WCS - 80404/0014 - 51813 vl support for loops.
- the bits of DRO are always 0/clear and the lower order bit of DR1 is always 1/set.
- DRO contains integer 0 and DR1 contains integer 1. Neither register can be overwritten.
- User data registers 48 are thirty-two registers of 64 bits known as UDRO through UDR31. User data can be sent directly from the ComList to the user array 60 without going through the data registers 46. A group of user data registers 48 can be loaded from the ComList. When the last user data register of the group is loaded the user array 60 is interrupted. The user array 60 must understand the protocol by which user data registers 48 need to be read. Flag registers 47 are thirty-two single-bit registers FRO through FR31.
- Commands can test and wait on the state of flag registers 47 before execution. Commands can change the state of flag registers 47, forcing them set and clear. In addition, there are logical instructions that allow combining flag states (AND, OR, XOR). Other commands allow testing and branching on their contents. For control purposes FRO is always 0/clear and FR1 is always 1/set. Neither of these registers can be overwritten.
- the purpose of the flag registers 47 is to support coordination between different operations.
- the user logic can coordinate loading and executing a new ComList by setting a specified flag register.
- a command can test the flag register and then branch to the end of the ComList, indicating that the next ComList may be loaded.
- the user logic can coordinate movement of data in or out of on-board memory 50 using flag registers and data registers.
- a DMA instruction can be held waiting for a flag register to be set.
- the user logic can write addresses and length into data registers, and then set a specified flag register, allowing the DMA instruction to execute the movement of data.
- another flag register can be set to indicate to the user logic that data movement is complete.
- An additional register visible in the MAP is the ComList pointer 49. This register is saved as part of the MAP status and indicates the index of the next command dispatched in the active ComList. When a MAP is halted the pointer indicates the next command to be executed if execution is restarted.
- On-board memory 50 consists of six banks of dual port 512K x 64-bit static random access memory ("RAM") providing a total of 24MB of available memory.
- Both the DMA controller 44 and the command processor 42 communicate with the on-board memory 50 via the six control-side ports 52.
- the user array 60 communicates with the on-board memory 50 through the six user-side ports 54.
- the user array 60 portion of a MAP 12 is configured as a hardware implementation of the algorithmic requirements of the user application.
- the user array 60 reads from, and writes to, on-board memory 50 through six user-side ports 54 and can interact with the control logic and the DMA controller 44.
- DMA operations within the interface of a MAP 12 run more or less independently of processing within a user's logic. Coordination between these control streams is through the use of the flag registers 47, data registers 46 and user data registers 48.
- the DMA controller 44 is initiated by a DMA ComList instruction.
- the user array 60 writes an address and length into the data registers 46 and sets a flag register 47.
- the ComList awaits the user array's signal, setting of the flag register, and then executes a DMA instruction that references the data registers 46 that have been setup by the user array.
- the DMA controller 44 can then signal its completion by setting or clearing a flag register 47 that the user array is monitoring.
- the FPGAs, user logic 62 ⁇ and 62 2 , of the user array 60 are configured during a fetch configuration command from data in common memory and not from an on-board programmable read only memory ("PROM").
- User logic 62 1 and 62 2 communicate between themselves using three dedicated 64-bit data buses 66 without using any memory bandwidth.
- MAP modules can also be connected together via a chaining data bus 64. Through the use of chaining ports 64 a particular MAP can send partial results to another MAP, or similarly, can receive such partial results from another MAP.
- the user array 60 provides all logic for the chaining ports 64.
- the chaining ports 64 use Double Data Rate ("DDR") protocol that allows each data line to carry
- DDR Double Data Rate
- Chaining data flow is basically unidirectional, while the three onboard inter-connecting data buses 66 are bi-directional.
- MAPs 12 0 - 12 N Figure 1
- instruction processors 100 0 - 100 N Figure 1
- MAPs can send interrupts to processors and can also receive a limited set of commands directly from the processors. Commands received directly from other processors, which are not fetched and executed from a MAP'S ComList, are called direct commands. Interrupts are sent by a MAP 12 to a given processor in a twelve bit serial stream along with twelve source synchronous clock pulses.
- FR31 provides a specific system interrupt mechanism.
- MAP status is stored in its status and control area and an interrupt is generated. This status operation is conditional on an interrupt enable flag taken from a control word in the status and control area.
- a Stop switch is provided to add flexibility with handling interrupts sent to a MAP. During normal operation a MAP will run with Stop enabled. This mode allows execution to halt when the MAP is interrupted. However, for diagnostic and debugging purposes, a MAP can run in a Stop disabled mode, providing the MAP with the ability to continue execution of a program, even when it has been interrupted.
- Stopping/halting MAP execution can result from several sources. In all cases this requires setting or attempting to set FR31. When there is an attempt to set FR31 either directly, or indirectly, and Stop is enabled, execution of the MAP is halted.
- the MAP can be interrupted in the following situations: (1 ) a direct Stop command is received from a processor, (2) FR31 is set from the ComList, (2) FR31 is set from the user logic, or (4) the command processor detects an error, such as an address exception, page or memory fault, or some other internal fault.
- Stop enabled the following occurs: MAP control logic comes to the most graceful halt it can, waiting for execution logic to go quiet step 200.
- the control state is saved in on-board memory addresses 0-31 step 210. Status is stored into the MAP's status and control area of common memory step 220.
- the MAP's TLB registers are stored into the TLB area step 230. If a direct stop command was received step 240 the MAP is stopped step 242, otherwise if interrupts are enabled step 250, an interrupt is sent step 260 and the MAP unit halts step 270. If Stop is disabled step 250, the sequence above is executed except that in the last step the command that caused the error is abandoned and execution continues step 252.
- any necessary changes to the MAP's parameters in the status and control area are made, including the TLB entries step 300. Determine if processing is to continue from the interrupted command step 310, the next command step 320, or from another command sequence step 330. If continuing execution of the instruction that caused the halt, reload all saved parameters step 312 and then Continue from Saved step 314. If the halt was not caused by an error or TLB miss (so that no instruction is partially executed) and the status and control parameters do not require changing or updating, then the direct command Continue can be sent to the MAP step 322.
- MAPs are actively controlled through the interface control program.
- the components of the interface control program include a
- Additional components, which coordinate to provide interface control, include the data registers 46, user data registers 48, flag registers 47, on-board memory 50, TLB 45 and user array 60.
- the MAPs interface is controlled by commands in the ComList or direct commands issued by an instruction processor or another MAP.
- an application generates code for the standard instruction processor, ComLists for the MAP's interface, and hardware logic for the MAP's user array.
- ComLists are generated by the user application in order to coordinate data movement and control between the application code running in the instruction processor, and application logic running in the MAP's user array.
- Commands in the ComList correspond directly to instructions in a reduced instruction set computer ("RISC") processor. These instructions are a small set of simple instructions for moving data, testing conditions, and branching.
- FPGA control processors can be reconfigured to function with various instruction sets, depending upon implementation needs.
- Each MAP 12 ( Figure 2) is provided with two 4-KB ComList pages 72 A and 72 B from the ComList area 70 in common memory 18 providing a maximum useable space for each ComList of 512 words.
- Each ComList page is identified through a unique MAP ID number.
- the address of the ComList pages 72 I A , 72 1B through 72 NA, 72 N B are specified in conjunction with the ID number of the associated MAP 12.
- Starting at relative address OK in each MAP's ComList page 72A+B is the first command list, ComListO 72 A , for that MAP.
- the second command list, ComListt 72 B is located at address 4K of that MAP's ComList area 72 A+B .
- TWO ComList pages 72 A and 72B are used to allow each MAP to work from one ComList page while the application software is loading the other ComList page. By swapping ComList pages the latencies of the processor and DMA are significantly reduced.
- the first word in each ComList is the ComList Length Command that defines the total number of command words in the buffer. For example, if there were only one additional command in the ComList, the first word, the Length Command, would have the value 2. This indicates that the ComList contains the length command word plus one additional command word.
- the application software loads a ComList into an available ComList page, ComListO or ComListt , of common memory step 400, when ComListO or ComListl is ready to be processed the instruction processor sends a corresponding direct instruction, FetchO or Fetchl, to the command processor step 410.
- the ComList is fetched from common memory step 420.
- the DMA only transfers enough cache lines to read all of the active ComListO.
- the complete ComList is stored in the ComList memory area 43 ( Figure 2), RAM set aside specifically for the ComList, of the control logic step 430. Placing the ComList in RAM allows the command processor 42 ( Figure 2) to execute ComList loop instructions. However, the ComList commands can only loop within one ComList and not from one ComList to another.
- a FETCHDONE signal is provided to the MAP command processor 42 ( Figure 2) and the ComList is executed step 440.
- Each ComList is a maximum of 496 words in length, reserving the upper portion of the ComList memory area 43 ( Figure 2) for bringing in 8 status and control words.
- a FetchO or Fetchl direct command can be issued prior to completion of the current ComList, queuing up the next ComList.
- the command processor 42 Figure 2) checks for the availability of another ComList step 450. If another ComList is ready it is processed in the same manner as the preceding ComList, otherwise the MAP 12 ( Figure 2) awaits further instructions step 460.
- command processor 42 ( Figure 2) begins executing a ComList that has a length of over 496, an error interrupt and halt will be generated.
- commands are executed sequentially unless the sequence is changed with a branch or is halted. Commands are available to set addresses,
- ComList commands are either 8 bytes or 16 bytes long. All commands, except certain direct commands, are fetched by the DMA controller 44 ( Figure 2) from the ComList pages 72 A+B ( Figure 5) in common memory 18 ( Figure 5) and executed by the command processor 42 ( Figure 2).
- command processor 42 ( Figure 2) executes the last command in the ComList space, and it is not a taken branch, execution stops.
- a direct command of FetchO or Fetchl is required to restart execution. If a Halt command is executed in a ComList sequence either a Start or Continue direct command is required.
- An interrupt is conditionally generated if MAP execution halts.
- the data fields in the first 8 bytes of a ComList command are structured in the following manner:
- the ComList Length Command is the first command in each ComList and is always located at address 0 of the ComList.
- the Length Command provides the total number of command and immediate data words contained in that particular ComList.
- the value in field (hh) may range from 2 to 496 words. A value of 2 would indicate that only the ComList Length Command is present plus one command word.
- a value of 496 is the max number of command words that can be used in the buffer. This max limit allows scratchpad room at the top of the 512-word buffer for status and control information.
- Direct Memory Access Commands move data between common memory 18 ( Figure 1 ) and a MAP's on-board memory 50 ( Figure 2).
- Data is always moved to and from common memory 18 ( Figure 1 ) in a stride-1 fashion: linearly with all bytes referenced.
- Stride or gather/scatter address sequences can be specified as on-board memory addresses.
- the stride or gather/scatter address offsets are sized indexes. This means that stride or gather/scatter indexes are multiplied by the operand stream data size (8 bytes) before being added to a specified base address. Stride and gather/scatter indexes can be negative. Overflow will not be detected.
- the index list is taken from on-board memory 50 ( Figure 2).
- Data in on-board memory can have the address reference sequence specified with a stride or with a sequence of gather/scatter addresses.
- the register DRee defines the stride and the format of the stride definition is as follows:
- the modulus field controls the number of banks in the bank sequence.
- a zero modulus indicates one bank in the bank sequence.
- a value 15 indicates a 16 bank sequence.
- the bank sequence has 16 fields of bank numbers. The bank numbers range from 0 to 7, where bank 7 is a no operation ("NOP") bank, meaning data will be discarded on a "write” and zero filled on a "read”.
- NOP no operation
- the stride field provides an increment for addresses in on-board memory. The address is incremented by the stride value after all banks in the sequence are addressed.
- the gather/scatter indexes are 32-bit 4-byte quantities taken from on-board memory. Each index is added, sign-extended, to the address base in DRcc, after having been multiplied/shifted by the operand size.
- the address in register DRaa is a byte address.
- All common memory addresses are virtual and are translated using translation look-aside buffers loaded from a MAP's status and control page.
- On-board memory addresses are physical and are not modified or translated.
- the direct memory access command sets or clears a flag register when the last specified data has been moved. For writes, this is done when the last data has been read from on-board memory and is on its way to common memory 50 ( Figure 2). For reads, this is done when the last data has been written to on-board memory 50 ( Figure 2).
- This command does not support waiting on a flag register value. If this is necessary, a DMA Flag Set command with the parameter fields set to wait for the needed value is available.
- the DMA Flag Set Command sets Flag register bits. It can do this when the DMA Flag Set command becomes available for issue or the command can wait for some DMA activity to be completed. This allows synchronization of the data movement of the DMA with the execution of a flag command. This command can also be followed with U_Data commands if it is desired to transfer command or data parameters to the user array after a DMA operation is complete.
- the DMA Flag Set Command is a two word command.
- the first word contains the command data and the second word contains immediate values for the flag register (i.e. bit2 is applied to FR2, bit3 to FR3..., bit31 to FR31 ).
- Any flag register that corresponds to a 1 bit in the mm field is set to a 1. Any flag register that corresponds to a 0 bit is not changed.
- the command lets the completion of a DMA action set any of the selected flag registers.
- Register-to-Register Data Register Arithmetic/Logic Commands do register-to-register integer add and subtract operations, and OR, AND, and XOR bitwise logical operations.
- DRcc is the minuend and DRdd is the subtrahend.
- Arithmetic operations are as follows: DRee - DRcc ⁇ DRdd.
- Bitwise logical operations are as follows: DRee r DRcc OR/AND/XOR DRdd. Overflow is not detected.
- Immediate Data Register Arithmetic/Logic Commands add, subtract, OR, AND, or XOR 64-bit immediate values to Data Registers.
- Data Register Arithmetic/Logic Command is a two-word command.
- the first word contains the command data and the second word contains the 64-bit immediate value.
- Single Data Register Load/Store Commands allow for a single Data register to be loaded from or stored to on-board memory. All 8 bytes are moved.
- DRee The contents of DRee are loaded or stored according to the a parameter.
- the on-board memory address referred to is DRcc + DRdd. Overflow is not detected.
- Multiple Data Register Load/Store Commands are two-word commands using 32-bit immediate values.
- the first word contains the command data and the second word contains immediate values for the data registers (i.e. bit2 is applied to DR2, bit3 to DR3..., bit31 to DR31 ). Any of the data registers can be moved to/from on-board memory. Each 1 bit in the immediate value enables the corresponding data register to be loaded or stored. Each Data register that has a corresponding 0 bit is ignored.
- the data in on-board memory is packed/compressed. (For example, if the immediate value has only two 1 bits corresponding to DR3 and DR10 and the on-board memory address is 1000 in a store operation, then DR3 is stored in on-board memory in word 1000 and DR10 is stored in 1001.)
- Data Register Branch Command tests a data register for a zero/nonzero state. If the test is successful, a branch to the specified command in the ComList is taken; otherwise the next command in the ComList is executed.
- Valid command addresses would be 1 through 495 providing that it is a command word and not an immediate data address. All bits in hh must be valid otherwise execution is undefined.
- Register-to-Register Flag Register Logic Commands perform logic operations on the Flag registers.
- Flag Registers Three of the Flag Registers have special values or functions. FRO is always 0/clear, and FR1 is always a 1/set. If FR31 sets, and interrupts are enabled, an interrupt and halt is generated after control logic idles down command execution and stores status to the MAP's status and control area. If
- ⁇ CS - 80404/0014 - 51813 vl FR31 is set when a command attempts to set it again, no further or secondary interrupt is generated from the attempt to set the Flag, but the interrupt Cause Bits are saved in MAP status. Setting FR31 will always halt MAP execution.
- Immediate Flag Register Logic commands perform logic operations on all 32 Flag registers at the same time. Each bit of the immediate value in the command is logically combined with the respective Flag Register with results written back to each register bit. Bits 0 and 1 of the immediate value (bits 32 and 33 of the command) are ignored as FRO and FR1 are constant values. This command is similar to a DMA Flag Set command with the exception that the DMA wait bits are set to 0.
- Flag Register Branch A FR is tested for zero/nonzero. If the test is successful branch to the specified command in ComList, otherwise execute the
- Valid command addresses would be 1 through 495 providing that it is a command word and not an immediate data address. All bits in hh must be valid; otherwise execution is undefined.
- DR Data U_Data Command sends command and parameter information from a specified data register to the user array. It is up to the logic that receives the data to interpret it as needed. This command transfers data from the DR defined in field dd to the User_Register defined in field cc. There are up to 32 user registers that are contained in a section of RAM above where the Data Registers are located. The User Registers differ from the Data Registers in being unidirectional from the control logic to the user array. When the ComList has filled the desired number of User Registers, it should set the Last Transfer bit in field Vindicating that this is the last User Register of the group to be transferred.
- a user register available signal is raised and is available to the user logic when field / " is set to indicate tht the last user register is transferred. It is up to the user array to know which registers need to be fetched. For example, it may be that only User RegisterOO is used. Or, possibly, only the first four user registers are used. Another possible protocol is that the User RegisterOO contains a bit set for each of the user registers to be fetched. The particular U_Logic algorithm implemented defines the User Register protocol.
- Immediate Data U_Data Command is a 16-byte command that sends command and parameter information from an immediate value in the command to the user array. It is up to the logic that receives the data to interpret it as needed. This command transfers immediate data to the User_Register defined in field cc. There are up to 32 User Registers that are contained in a section of RAM above where the Data Registers are located. The User Registers differ from the Data Registers in being unidirectional from the control logic to the user array. When the ComList has filled the desired number of User Registers, it should set the Last Transfer bit in field f to indicate this is the last User Register of the group transferred. A user register available signal is raised and is
- ⁇ CS - 80404/0014 - 51813 vl available to the user logic when field f is set to indicate tht the last user register is transferred. It is up to the user array to know which registers need to be fetched. For example, it may be that only User RegisterOO is used. Or, possibly, only the first four User Registers are used. Another possible protocol is that the User RegisterOO contains a bit set for each of the user registers to be fetched. The particular U_Logic algorithm that is implemented defines the protocol used.
- U_Data commands/parameters can be sent after the execution of the command DMA Flag Set. Up to thirty-two 64-bit immediate values can be sent from either Data Registers or the second immediate data words. It is up to the receiving user array to read and interpret the information sent. In turn, the user array can send Flag information back to the Flag Registers. Flag information can be returned without having been solicited by other commands. Thus, the user array can indicate that an internal user array action is complete and that it is time to start an external action, such as starting a DMA sequence to store a result. The user array can also request an interrupt by setting FR31. An interrupt request from the user array is processed the same as if the interrupt
- ⁇ CS - 80404/0014 - 51813 vl came from any other source (data error, ComList command, Illegal instruction, etc.).
- DMA activity can also interact with the user array.
- the DMA Flag Set command can be followed with U_Data commands. With multiple ways to control the Flag Registers and given that all commands can test the Flag Registers, coordinated interaction is easily done between all three elements of MAP: user array, DMA and other ComList processing.
- Fetch Configuration Data Command requests that the DMA move an FPGA configuration block of data between common memory and the MAP.
- the Fetch Configuration Data Command must be the last command in a ComList.
- the command processor and the DMA will go into a special non-interruptible mode until the FPGA is configured.
- configuration normal ComList commands will not be fetched or executed from the ComList area of common memory.
- the user FPGAs are checked for proper configuration. When the FPGA has been properly configured, a status indication and interrupt will be returned to the processor indicating that the MAP is returning to normal operation.
- Direct Commands are sent directly to a MAP from an instruction processor or another MAP, therefore they are not included in a ComList. These commands give a real-time and direct control to meet operating system needs, and driver requirements. Direct commands have no parameters.
- Stopping/halting MAP execution can result from several sources. In all cases this requires setting or attempting to set FR31. When there is an attempt to set FR31 either directly, as with the ComList DMA Flag Set command, or indirectly, as with the direct Stop command, or as a result of some internal error, execution of the MAP is halted.
- Stop Upon any halt or exception, with Stop enabled, the following occurs:
- MAP control logic comes to the most graceful halt it can, waiting for execution logic to go quiet.
- Control state is saved in on-board memory addresses 0-31. This state includes everything needed to continue execution, for example from a TLB/Page Table miss. This area in on-board memory is reserved for this use.
- the MAP's TLB registers are stored into the TLB area of its Base page. 5) If the stop did not come from a direct command, and if interrupts are enabled, an interrupt is sent.
- Stop is not enabled, the sequence above is executed except that in the last step the command that caused the error is abandoned and execution continues.
- Status consists of several registers as listed below.
- the 8-byte Index column shows the address offset from the start of a MAP's status and control area as an 8-byte word index.
- Positions not shown in the table are stored as 0 bits. Bit3/ Illegal is also set, and execution halted, if the last command in the ComList page is executed and is not a taken branch.
- Bit Parameters 0, 1 , and 2 are the zero-values.
- the Interrupt ID parameter need not be valid if interrupts are disabled. Page Table Base must be valid and point to an end-of-table entry.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computer Hardware Design (AREA)
- Advance Control (AREA)
Abstract
Description
Claims
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP02723783A EP1459219A4 (en) | 2001-12-05 | 2002-04-05 | An interface for integrating reconfigurable processors into a general purpose computing system |
AU2002254549A AU2002254549A1 (en) | 2001-12-05 | 2002-04-05 | An interface for integrating reconfigurable processors into a general purpose computing system |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/011,835 | 2001-12-05 | ||
US10/011,835 US7155602B2 (en) | 2001-04-30 | 2001-12-05 | Interface for integrating reconfigurable processors into a general purpose computing system |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2003050723A1 true WO2003050723A1 (en) | 2003-06-19 |
Family
ID=21752173
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2002/010813 WO2003050723A1 (en) | 2001-12-05 | 2002-04-05 | An interface for integrating reconfigurable processors into a general purpose computing system |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP1459219A4 (en) |
AU (1) | AU2002254549A1 (en) |
WO (1) | WO2003050723A1 (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5036473A (en) * | 1988-10-05 | 1991-07-30 | Mentor Graphics Corporation | Method of using electronically reconfigurable logic circuits |
US5329630A (en) * | 1988-03-23 | 1994-07-12 | Dupont Pixel Systems Limited | System and method using double-buffer preview mode |
US5448496A (en) * | 1988-10-05 | 1995-09-05 | Quickturn Design Systems, Inc. | Partial crossbar interconnect architecture for reconfigurably connecting multiple reprogrammable logic devices in a logic emulation system |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5708830A (en) * | 1992-09-15 | 1998-01-13 | Morphometrix Inc. | Asynchronous data coprocessor utilizing systolic array processors and an auxiliary microprocessor interacting therewith |
US6134605A (en) * | 1998-04-15 | 2000-10-17 | Diamond Multimedia Systems, Inc. | Redefinable signal processing subsystem |
EP1061439A1 (en) * | 1999-06-15 | 2000-12-20 | Hewlett-Packard Company | Memory and instructions in computer architecture containing processor and coprocessor |
-
2002
- 2002-04-05 WO PCT/US2002/010813 patent/WO2003050723A1/en not_active Application Discontinuation
- 2002-04-05 AU AU2002254549A patent/AU2002254549A1/en not_active Abandoned
- 2002-04-05 EP EP02723783A patent/EP1459219A4/en not_active Withdrawn
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5329630A (en) * | 1988-03-23 | 1994-07-12 | Dupont Pixel Systems Limited | System and method using double-buffer preview mode |
US5036473A (en) * | 1988-10-05 | 1991-07-30 | Mentor Graphics Corporation | Method of using electronically reconfigurable logic circuits |
US5448496A (en) * | 1988-10-05 | 1995-09-05 | Quickturn Design Systems, Inc. | Partial crossbar interconnect architecture for reconfigurably connecting multiple reprogrammable logic devices in a logic emulation system |
US5452231A (en) * | 1988-10-05 | 1995-09-19 | Quickturn Design Systems, Inc. | Hierarchically connected reconfigurable logic assembly |
US5612891A (en) * | 1988-10-05 | 1997-03-18 | Quickturn Design Systems, Inc. | Hardware logic emulation system with memory capability |
US5657241A (en) * | 1988-10-05 | 1997-08-12 | Quickturn Design Systems, Inc. | Routing methods for use in a logic emulation system |
US5661662A (en) * | 1988-10-05 | 1997-08-26 | Quickturn Design Systems, Inc. | Structures and methods for adding stimulus and response functions to a circuit design undergoing emulation |
US5734581A (en) * | 1988-10-05 | 1998-03-31 | Quickturn Design Systems, Inc. | Method for implementing tri-state nets in a logic emulation system |
US5796623A (en) * | 1988-10-05 | 1998-08-18 | Quickturn Design Systems, Inc. | Apparatus and method for performing computations with electrically reconfigurable logic devices |
US5812414A (en) * | 1988-10-05 | 1998-09-22 | Quickturn Design Systems, Inc. | Method for performing simulation using a hardware logic emulation system |
US6002861A (en) * | 1988-10-05 | 1999-12-14 | Quickturn Design Systems, Inc. | Method for performing simulation using a hardware emulation system |
Non-Patent Citations (1)
Title |
---|
See also references of EP1459219A4 * |
Also Published As
Publication number | Publication date |
---|---|
EP1459219A1 (en) | 2004-09-22 |
AU2002254549A1 (en) | 2003-06-23 |
EP1459219A4 (en) | 2006-02-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7155602B2 (en) | Interface for integrating reconfigurable processors into a general purpose computing system | |
US4648034A (en) | Busy signal interface between master and slave processors in a computer system | |
JP3983394B2 (en) | Geometry processor | |
JP6243935B2 (en) | Context switching method and apparatus | |
US5978838A (en) | Coordination and synchronization of an asymmetric, single-chip, dual multiprocessor | |
US5822606A (en) | DSP having a plurality of like processors controlled in parallel by an instruction word, and a control processor also controlled by the instruction word | |
EP0557884B1 (en) | Data processor having a cache memory and method | |
JP2834837B2 (en) | Programmable controller | |
US20040193837A1 (en) | CPU datapaths and local memory that executes either vector or superscalar instructions | |
KR100981033B1 (en) | Method and apparatus for interfacing a processor to a coprocessor | |
JP4226085B2 (en) | Microprocessor and multiprocessor system | |
WO2001016758A2 (en) | Double shift instruction for micro engine used in multithreaded parallel processor architecture | |
US20170147345A1 (en) | Multiple operation interface to shared coprocessor | |
US9910801B2 (en) | Processor model using a single large linear registers, with new interfacing signals supporting FIFO-base I/O ports, and interrupt-driven burst transfers eliminating DMA, bridges, and external I/O bus | |
US6915414B2 (en) | Context switching pipelined microprocessor | |
US6594711B1 (en) | Method and apparatus for operating one or more caches in conjunction with direct memory access controller | |
US20020053017A1 (en) | Register instructions for a multithreaded processor | |
JPH10143494A (en) | Single-instruction plural-data processing for which scalar/vector operation is combined | |
EP0447101A2 (en) | Processor with data format-independent instructions | |
WO2003050723A1 (en) | An interface for integrating reconfigurable processors into a general purpose computing system | |
KR19980018071A (en) | Single instruction multiple data processing in multimedia signal processor | |
US20040177224A1 (en) | Local memory with ownership that is transferrable between neighboring processors | |
WO2001025901A9 (en) | Efficient implementation of multiprecision arithmetic | |
JP2696578B2 (en) | Data processing device | |
CN113841126A (en) | Processor, system and method for storing register data elements |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TN TR TT TZ UA UG UZ VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2002723783 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 2002723783 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: JP |
|
WWW | Wipo information: withdrawn in national office |
Country of ref document: JP |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 2002723783 Country of ref document: EP |