[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

US20110035722A1 - Method for Specifying Stateful, Transaction-Oriented Systems for Flexible Mapping to Structurally Configurable In-Memory Processing Semiconductor Device - Google Patents

Method for Specifying Stateful, Transaction-Oriented Systems for Flexible Mapping to Structurally Configurable In-Memory Processing Semiconductor Device Download PDF

Info

Publication number
US20110035722A1
US20110035722A1 US12/906,967 US90696710A US2011035722A1 US 20110035722 A1 US20110035722 A1 US 20110035722A1 US 90696710 A US90696710 A US 90696710A US 2011035722 A1 US2011035722 A1 US 2011035722A1
Authority
US
United States
Prior art keywords
memory
signals
flowgate
commutebuffers
flowlogic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US12/906,967
Inventor
Shridhar Mukund
Anjan Mitra
Jed Krohnfeldt
Clement Leung
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to US12/906,967 priority Critical patent/US20110035722A1/en
Publication of US20110035722A1 publication Critical patent/US20110035722A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7867Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • G06F15/7821Tightly coupled to memory, e.g. computational memory, smart memory, processor in memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/32Circuit design at the digital level
    • G06F30/327Logic synthesis; Behaviour synthesis, e.g. mapping logic, HDL to netlist, high-level language to RTL or netlist
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/32Circuit design at the digital level
    • G06F30/33Design verification, e.g. functional simulation or model checking
    • G06F30/3323Design verification, e.g. functional simulation or model checking using formal methods, e.g. equivalence checking or property checking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/34Circuit design for reconfigurable circuits, e.g. field programmable gate arrays [FPGA] or programmable logic devices [PLD]
    • G06F30/347Physical level, e.g. placement or routing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/39Circuit design at the physical level
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • SOC System on a chip
  • Verilog design languages
  • VHDL design languages
  • Current efforts to improve design productivity have aimed at design capture at a higher level of abstraction, via more algorithmic/system approaches such as C++, C, SystemC and System Verilog.
  • CAD tools for placement and route of synthesized logic netlists have delivered limited success in addressing the physical design requirements of deep submicron process technologies.
  • the semiconductor industry needs a design methodology and a supporting tool suite that can improve productivity through the entire design cycle, from design capture and verification through physical design, while guaranteeing product manufacturability at the same time.
  • SOC implementations of stateful, transaction-oriented applications depend heavily on on-chip memory bandwidth and capacity for performance and power savings. Placement and routing of a large number of memory modules becomes another major bottleneck in SOC physical design.
  • processor-in-memory architectures are driven by requirements to support advanced software programming concepts such as virtual memory, global memory, dynamic resource allocation, and dynamic load balancing.
  • the hardware and software complexity and costs of these architectures are justified by the requirement to deliver good performance for a wide range of software applications. Due to these overheads, multiple processor-in-memory chips are required in any practical system to meet realistic performance and capacity requirements, as witnessed by the absence of any system product development incorporating a single processor-in-memory chip package.
  • the present invention fills these needs by providing a method and apparatus for performing in-memory computation for stateful, transaction-oriented applications. It should be appreciated that the present invention can be implemented in numerous ways, including as a process, an apparatus, a system, or a device. Several inventive embodiments of the present invention are described below.
  • a method for specifying stateful, transaction-oriented systems begins with designating a plurality of primitive FlowModules.
  • the method includes defining at least one FlowGate within each of the plurality of FlowModules, wherein each FlowGate includes a non-interpretable sequence of procedure code, a single point of entry and is invoked by a named concurrent call.
  • An Arc is designated from a calling FlowGate to a called FlowGate and a Signal is generated for each named invocation of the called FlowGate.
  • a Channel is defined for carrying the Signal.
  • a method for synthesizing a stateful, transaction-oriented system for flexible mapping to a structurally field-configurable semiconductor device having a multi-level array of storage elements, for in-memory processing begins with mapping FlowLogic to a network of FlowVirtualMachines (FVM).
  • FVM FlowVirtualMachines
  • a FlowModule is mapped into a corresponding FlowVirtualMachine (FVM) and one or more FVMs are integrated into an AggregateFVM (AFVM).
  • AFVM AggregateFVM
  • One or more AFVMs are composed into a FlowTile, and Signals are routed between FlowModules.
  • a method for routing FlowLogic Signals over a structurally configurable in-memory processing array begins with configuring a pool of memory resource units into corresponding OutputBuffers, CommuteBuffers and ChannelMemories, the pool of memory units shared with a FlowLogicMachine.
  • a producer-consumer relationship between the corresponding OutputBuffers and CommuteBuffers is configured and a producer-consumer relationship between the CommuteBuffers and VirtualChannels residing in the ChannelMemories is configured.
  • Producer-consumer relationships between the OutputBuffers and VirtualChannels residing in said ChannelMemories are configured and producer-consumer relationships between the CommuteBuffers and neighboring CommuteBuffers are configured.
  • a method for debugging a stateful, transaction-oriented runtime system having a multi-level array of storage elements includes instructing the stateful transaction oriented system to pause and instructing the stateful transaction oriented system to single step until a given point. Information for selected FlowGate invocations is tracked and areas within the multi-level array of storage elements are queried for the debugging session.
  • FIG. 1 is a high-level simplified schematic diagram of flow modules in accordance with one embodiment of the invention.
  • FIG. 2 is a simplified schematic diagram illustrating the data path of a compute element of a tile in accordance with one embodiment of the invention.
  • FIG. 3 depicts the notion of FlowTunnels in accordance with one embodiment of the invention.
  • FIG. 4 is a simplified schematic diagram illustrating a logical view for the execution of a FlowModule in accordance with one embodiment of the invention.
  • FIG. 5 is a simplified schematic diagram illustrating the ability to aggregate several flow modules into one aggregate structure in accordance with one embodiment of the invention.
  • FIG. 6 is a high-level schematic diagram illustrating a tile that supports a corresponding set of virtual processors in accordance with one embodiment of the invention.
  • FIG. 7 is a high-level simplified schematic illustrating an architectural view of a FlowLogicMachine in accordance with one embodiment of the invention.
  • FIG. 8 is a simplified schematic diagram illustrating the data flow within a flow logic machine in accordance with one embodiment of the invention.
  • FIG. 9 is a simplified schematic diagram illustrating a Tile having an adapter to interface with an external device to the Tile in accordance with one embodiment of the invention.
  • FIG. 10 is a flowchart diagram illustrating the method operations for configuring and programming a semiconductor circuit device having a multiple level array of memory storage cells in accordance with one embodiment of the invention.
  • the embodiments of the present invention described below provide a method and apparatus enabling flexible design capture methodology which allows a designer to select the granularity at which a stateful, transaction-oriented application is captured.
  • An efficient methodology to implement a stateful, transaction-oriented application on a platform economically superior with respect to design effort, implementation costs and manufacturability is further described below.
  • the embodiments utilize an execution model that allows for efficient compiler optimization and resource allocation, efficient hardware implementation, and accurate performance analysis and prediction when a design is captured and analyzed. It should be appreciated that no significant uncertainty is introduced by design compilation, mapping into the physical platform, or resource conflicts during system operation. The resource requirements are specified explicitly when the design is captured, using annotations or compiler analysis. Allocation of hardware resources can be determined statically at compile time.
  • a simple and effective chip architecture that uses a single level real memory organization to eliminate the costs of managing a caching hierarchy associated with virtual memory systems in applications development, compiler optimization, run-time system support, and hardware complexity is provided.
  • the embodiments described herein meet the tremendous demands of memory capacity and bandwidth in future generation SOCs with solutions that are economical in die area, product development cycle and power consumption.
  • the embodiments reap the cost, performance and power consumption benefits of advanced deep submicron fabrication processes with exceedingly high manufacturability and reliability.
  • FIG. 1 is a high-level simplified schematic diagram of FlowModules in accordance with one embodiment of the invention.
  • FlowModules 100 a through 100 d represent objects in accordance with one embodiment of the invention.
  • FlowModules 100 a through 100 d are mostly comprised of memory arrays in this embodiment.
  • FlowModule 100 a includes FlowMethod 104 , States 106 , and FlowGate 112 .
  • Signals 102 are processed and commuted between FlowModules through FlowGates 112 .
  • Signals 102 which may be referred to as messages, are in packet format in one embodiment of the invention.
  • the primary inputs and outputs into the FlowLogic architecture are also Signals.
  • Arc 108 represents a channel through which data flows between FlowGates 112 .
  • Arcs 108 represent queues and Signals 102 are transmitted through Arcs 108 .
  • FlowModules 100 represent objects, defining codes and data allocated to memory.
  • FIG. 1 further illustrates FlowModule 100 a and FlowModule 100 b within hierarchical FlowModule 110 .
  • FlowModule 100 a and FlowModule 100 b are grouped within hierarchical FlowModule 110 for convenience in one embodiment. In other words, the grouping of FlowModule 100 a and FlowModule 100 b may be analogized to an alias.
  • Arcs 108 may be characterized as a ForwardArc 108 a, a CallForwardArc 108 b or a TimedArc 108 c in one embodiment. The details for these types of Arcs are provided below. It should be appreciated that Arcs 108 are created in application-specific fashion. FlowGates 112 are invoked through an external Signal and are akin to a function call.
  • PrimitiveFlowModules are concurrent entities that include FlowGates 112 , States 106 , and FlowMethods 104 .
  • Arcs 108 emanate from a FlowGate and terminate at a FlowGate.
  • An Arc can carry one or more Signals at a given time.
  • a FlowGate is invoked by a Signal instance, i.e., a Signal instance is targeted to invoke a specific FlowGate.
  • a Signal instance is a stream of bytes that carries necessary arguments which may be a small message or a large packet or of any size in between.
  • a Signal also may carry a priority-class attribute. Signals within a class (priority-class) are guaranteed to arrive in the order they were generated at the head of the Arc. It should be appreciated that FlowGate 112 does not have a state of its own. FlowGate 112 can modify the state of the FlowModule it resides in and the FlowGates may generate one or more Signals and thereby invoke one or more FlowGates concurrently. In one embodiment, FlowGate 112 may be thought of as an indivisible and un-interruptible sequence of procedural code that typically terminates after a short burst of execution.
  • FlowLogic guarantees that one and only one FlowGate within a FlowModule is active at any time and a FlowGate once started is guaranteed to complete.
  • FlowMethods as used herein are sequential bodies of code, e.g., C style function calls that the FlowGates within a FlowModule may use to achieve their end goals.
  • Hierarchical FlowModules comprise one or more FlowModules 100 a - d, and are largely used to facilitate FlowLogic code reuse and interface specification exchange.
  • a TimedArc is a special case of an Arc, where the constituent Signals carry a Timer.
  • TimedArc The constituent Signals in a TimedArc will invoke corresponding FlowGate out-of-order as and when the Timer expires.
  • TimedArcs are specifically constrained to originate and terminate within the same FlowModule.
  • a ForwardArc is another special case of an Arc 108 whose destination is implied by Signals carried by a CallForwardArc. It should be appreciated that the notion of threads or processors does not exist in the FlowLogic description. FlowLogic can be thought of as a set of interacting pipelines of Signal flows.
  • the FlowLogic architecture can be used to describe an arbitrary transaction-oriented application using an arbitrary number of interconnected FlowLogic components. Isochronous systems can also be described with reasonable timing resolution. It should be noted that FlowLogic is not meant for traditional digital logic system design where cycle accuracy and deterministic behavior is paramount. Systems designed using FlowLogic are non-deterministic, but can have well-known end-to-end functional behavior independent of the delays in the Arc. Arcs are guaranteed not to drop Signals unless they are attributed specifically to do so. The quantitative or performance behavior of the system may change depending on the parameters of the Arcs, including delay (latency), capacity, priority and so forth.
  • the FlowLogic architecture allows flexible design space exploration of performance and quantitative behavior, followed by flexible mapping of the results into the said structurally field-configurable semiconductor device.
  • the parameters related to Arcs 108 are determined interactively during system simulations using FlowLogic. It may be noted that the performance behavior of such systems will only be as good as the traffic pattern assumptions made in the simulation.
  • FlowGates referred to as DynamicFlowGates can be dynamically loaded and linked at run-time.
  • DynamicFlowGates are limited to serving the purposes of run-time system diagnostics and debug.
  • FIG. 2 shows an alternative structural view to the FlowLogic system in accordance with one embodiment of the invention.
  • FlowModules 100 a through 100 d are interconnected through a set of Arcs or Channels. These Arcs or Channels of FIG. 2 may be classified as Random Read Channels 116 , Priority Class Channel 114 , or Random Access Channel 118 , in accordance with one embodiment of the invention.
  • the FlowModules are mainly composed of memory regions and Channels 114 , 116 , and 118 provide the wiring for communication between these memory regions. It should be appreciated that different types and capacity channels are inferred interactively from a FlowLogic description via annotations.
  • Signal types carry attributes that determine the range of priority-class, type and capacity of the Channel.
  • a set of Arcs between two FlowModules map into one or more virtual Channels depending on the Signal types that the Arcs carry.
  • a Channel can be thought of as a uni-directional memory element with FlowMethods for producer writes, consumer reads, and synchronization and flow control.
  • a Channel may be a first-in-first-out (FIFO) serial queue.
  • a Channel may be serial-write, random-read for the purposes of filtering and classification functions.
  • a Channel may comprise random-write and random-read ability to exchange semaphores.
  • FIG. 3 depicts the notion of FlowTunnels in accordance with one embodiment of the invention.
  • a FlowTunnel 101 is a FlowLogic sub-design that bridges communications between two clusters of FlowLogic. While FlowLogic clusters are optimized for implementation on semiconductor devices with over-provisioned internal communication paths, FlowTunnels encapsulate relatively lower bandwidth communication paths such as serial interfaces between sub-systems. For example, in the preferred embodiment, one cluster corresponds to the portion of FlowLogic description that is implemented on a host processor. This is connected to another cluster which is implemented on an in-memory processing device communicating over a PCI Express link. Also FlowLogic Clusters implemented on separate semiconductor dies and or devices communicate with each other over relatively lower bandwidth FlowTunnels. The functionality of a FlowTunnel comprises of buffering, re-synchronization, coalescing and priority based scheduling.
  • FIG. 4 is a simplified schematic diagram illustrating a logical view for the execution of a FlowModule in accordance with one embodiment of the invention. It should be appreciated that the embodiments described herein take an object and translate that into a FlowModule, which is then further translated into a FlowVirtualMachine (FVM).
  • FlowVirtualMachine 100 represents the execution model of a FlowModule.
  • FVM 100 includes FlowGateIndex 120 .
  • a Signal will hit a FlowModule and through the FlowGateIndex it is determined which FlowGate to execute.
  • the data within the Signal itself will identify the FlowGate to pick up.
  • StackMemory 122 CodeMemory 124 , StateMemory 126 , OutputBuffer 128 , and ChannelMemory 130 are further included in FVM 100 .
  • OutputBuffer 128 is a relatively small memory area for temporarily staging outgoing Signals.
  • ChannelMemory 130 is on the input side for receiving messages into FVM 100 . It should be appreciated that each portion of the memories within FVM 100 is shared or aggregated by FlowGates with the exception of CodeMemory 124 . Thus, when a Signal hits a FlowGate, as mentioned above, there is a pointer to invoke the FlowGate code. It should be appreciated that FIG.
  • variable components of a FVM are the memory partitions and their contents, by varying which any FlowModule can be mapped and executed on it.
  • the code related to FlowGates and FlowMethods is compiled into relocatable machine code which in-turn determines the logical size of the corresponding FVM CodeMemory.
  • the FlowGateIndex contains a jump table indexed on unique FlowGate identifier along with the pointer to the FlowGate code, among other context data for proper FlowGate execution.
  • the StackMemory is used for storing intermediate states as required during the FlowGate execution. There are no register files in the FVM. The working of the FVM is analogous to that of a stack machine. The Stack is always empty before a FlowGate starts since the FlowGate by itself does not have a persistent state, and the FlowGate is not allowed to suspend.
  • the size or the depth of the Stack is determined at compile-time by the FlowLogic compiler. As may be evident, FlowLogic programming style does not support nested calls and recursive function calls whose depths are not predictable at compile-time. Furthermore, there is no dynamic allocation or garbage collection in FlowLogic because memory resource allocations are fixed at compile-time. Other than temporary variables whose life times span the FlowGate call, State variables are all pre-allocated at compile-time. The size of the StateMemory 126 for a FVM is well known at the compile time. OutputBuffer 128 and ChannelMemory 130 are managed by the run-time system and are visible to the system designer only via annotation in one embodiment.
  • OutputBuffer 128 is a small memory area for temporarily staging outgoing Signals.
  • ChannelMemory 130 hosts the Channels and is as large as is required by the corresponding FVM. It is useful to point out at this time that although these memories have different access data paths, the memories all use the same resource types in the structurally configurable in-memory processing array. In fact, memories are the only resources directly allocated in the array, with other necessary logic, including processing elements, being fixed to such memory resources.
  • FIG. 5 is a simplified schematic diagram illustrating the ability to aggregate several FlowModules into one aggregate structure in accordance with one embodiment of the invention.
  • multiple FVMs are aggregated and placed into what is referred to as a FlowTile.
  • Aggregate FVM 132 includes a similar structural representation as an individual FVM, i.e., FlowGateIndex 120 a, StackMemory 122 a, CodeMemory 124 a, StateMemory 126 a, OutputBuffer 128 a, and ChannelMemory 130 a.
  • Module pointers (MP) x, y, and z are pointers pointing to corresponding StateMemory areas of the aggregated FlowModules.
  • FlowGateIndex 120 a will now index into the CodeMemory, as well as the StateMemory, since multiple FlowModules have been aggregated together. It should be appreciated that the ability to aggregate several concurrent FlowModules into one aggregate is a distinguishing factor behind the FVM architecture.
  • the StackMemory size is the maximum of the StackMemory sizes of the individual FVMs.
  • CodeMemory 124 a is the sum of the code memories of the aggregated FVMs. However, in one embodiment, CodeMemory 124 a may be shared among different FlowModules, resulting in a total size that is smaller than the sum.
  • CodeMemory 124 a may even contain a single code copy shared among multiple instances.
  • OutputBuffer 128 a and the ChannelMemory 130 a blocks are managed by the run-time system, in a fashion largely transparent to the application.
  • FIG. 6 is a high-level schematic diagram illustrating a FlowTile that supports a corresponding set of virtual processors in accordance with one embodiment of the invention.
  • the FlowTile is composed of aggregate FVMs 132 a, 132 b, and 132 c.
  • Run-time system 134 functions to determine which Signal is associated with which FlowGate.
  • run time system 134 which may be referred to as a kernel, will coordinate the flow of Signals within the FlowTile.
  • Commute element 136 functions to move Signals into and out of the FlowTile.
  • Commute element 136 may be thought of as an interface or a router for the various Signals being transmitted.
  • the router functionality is illustrated here as being internal to the system, however, the router functionality may alternatively be external to the FlowTile in another embodiment.
  • multiple AFVMs are mapped to a FlowTile that supports a corresponding set of virtual processors.
  • a FlowTile is a physical entity that has a certain total number of memory resource units. The sum of the resources required by the AFVMs cannot exceed this total. Within this constraint, memory units can be mapped flexibly to serve the functionality of the constituent FlowModules.
  • a FlowTile has a corresponding Runtime System, which coordinates the flow of Signals within the FlowTile.
  • the Commute element is responsible for moving Signals out of the OutputBuffer and into the corresponding ChannelMemory.
  • FIG. 7 is a high-level simplified schematic illustrating an architectural view of a FlowLogicMachine in accordance with one embodiment of the invention.
  • each FlowTile 140 a through 140 n is connected to in-memory Signal router 142 through corresponding commute elements 136 a through 136 n.
  • in-memory Signal router 142 performs routing functionality within the chip that the FlowLogicMachine is designed for.
  • the coordination of Signals is performed by run-time systems 134 a through 134 n, respectively.
  • FlowTiles 140 a - n are connected to the application independent in-memory router 142 for routing Signals within the FlowLogicMachine.
  • Memory router 142 includes Commute elements 136 a - n associated with every FlowTile.
  • the in-memory router 142 is sufficiently over-provisioned to ensure that Signals flow out of the OutputBuffer and in-memory router 142 without causing blockages, and with minimal transit time. If there is a blockage, the blockage is constrained to the ChannelMemory, where it manifests as a system characteristic, which can be appropriately alleviated at the level of the FlowLogic design representation. As mentioned above the router functionality may also be performed externally.
  • the run-time system ensures that Signals are created only if the receiving Channel has sufficient credits, ensuring that the worse case behavior of deadlock, over-run etc. does not occur.
  • Commute elements 136 a - n further breaks up Signals into small flow control digits (Flits) ensuring that end-to-end latency is not sensitive to Signal sizes.
  • FIG. 8 is a simplified schematic diagram illustrating the data flow within a FlowLogicMachine in accordance with one embodiment of the invention.
  • the OutputBuffer for a FlowTile originating a Signal will forward that Signal to the Commute element, where a larger message may be broken up into smaller chunks and passed through intermediate stages. These smaller chunks are then accumulated in ChannelMemory, which is in a FlowTile consuming the data in one embodiment.
  • Signals from the OutputBuffer are independently read out by the Commute element and segmented into Flits, which are the flow control digits or primitives.
  • FIG. 9 is a simplified schematic diagram illustrating a FlowTile having an Adapter to interface with an external device to the FlowTile.
  • FlowTile 140 is in communication with Adapter 144 .
  • Adapter 144 can provide an interface for chip-to-chip communication in accordance with one embodiment.
  • Adapter 144 may provide a packet interface in order to transfer packets between devices. It should be appreciated that the Adapter can be designed so that the interface is application-specific.
  • some of the FlowTiles e.g., on the periphery of the array, are configured to interface with the external world, e.g., other chips.
  • the interface for the external world is also a Signal based interface that is accomplished through Adapter 144 as shown in FIG. 9 .
  • the FlowLogicMachine can itself be thought of as an array of structurally configurable memory units that implements a plurality of FlowTiles, where the computational logic is fixed and distributed.
  • the FlowLogic language described herein may be thought of as the JAVA language, while the FlowLogicMachine may be analogized to the JAVA Virtual machine, since the FlowLogic Language has some attributes of object oriented programming languages.
  • the resources in question are memory units in one form or another, i.e., code, state, stack, channels, and buffer.
  • the FlowLogicMachine is designed to provide the ability to configure these memory units, also referred to as memory resources, as required by a particular application and the FlowLogic representation allows the flexibility of re-casting a system description in flexible ways to achieve the targeted capacity, performance, and functionality.
  • FIG. 7 shows the architectural view of a FlowLogicMachine. It comprises a plurality of FlowTiles.
  • the FlowTiles are connected to the application independent in-memory router for routing Signals within the FlowLogicMachine.
  • the said router comprises Commute elements associated with every FlowTile.
  • the router is sufficiently over-provisioned to ensure that Signals flow out of the OutputBuffer and through the router itself without causing blocks, with minimal transit time. If there is a blockage, it is constrained to the Channel Memory, where it manifests as a system characteristic which can be appropriately analyzed and managed at the FlowLogic level of the design representation.
  • the run-time system ensures that Signals are created only if the receiving Channel has sufficient credits, ensuring that the worse case behavior of deadlock, over-run etc. does not occur.
  • the overhead of credit based flow control management is tunable at the FlowLogic design representation level by providing adequate sizing Channel attribute.
  • the Commute element further breaks up Signals into small flow control digits (Flits) ensuring that end-to-end latency is not sensitive to Signal size and the number of hops.
  • Some of the FlowTiles are configured to interface with the external world.
  • the said interface is also a Signal based interface that is accomplished through Adapter as shown in FIG. 9 .
  • the FlowLogicMachine can itself be thought of as an array of structurally configurable memory units that implements a plurality of FlowTiles where the computational logic is fixed and distributed.
  • memory units in one form or another: code, state, stack, channels, and buffer.
  • the FlowLogicMachine is designed to provide the ability to configure the memory units as required by a particular application and the FlowLogic representation allows the flexibility of re-casting a system description in flexible ways to achieve the targeted capacity, performance and functionality.
  • the FlowLogicMachine has novel features that help in system diagnosis among others. FlowGates are by-design atomic and always go to completion, once fired. There is no notion of run-time instruction-level single-stepping in the context of FlowLogicMachine. Instead, it can be stepped on FlowGate boundaries. FlowTiles can be instructed to execute one FlowGate at a time. An external debug controller can observe the StateMemory, ChannelMemory and other partitions of the FVM by making explicit system read calls when the FlowLogicMachine is paused between steps of FlowGate execution. The debug controller may even launch DynamicFlowGates to achieve diagnostic goals.
  • the FlowLogicMachine has built-in FlowGates called SystemFlowGates for read, write and configuration purposes. The SystemFlowGates come into existence on device boot, independent of applications. These SystemFlowGates are also used for booting application-specific FVMs.
  • the embodiments described herein also support runtime debugging of the FlowLogicMachine.
  • the FlowLogic runtime system can be controlled from an outside machine (host) through sending and receiving of signals with specific debugging payloads.
  • the host sends debugging commands to the runtime system in signals; it also receives data and state information back from the runtime system in signals.
  • FlowLogic is not a general method for describing any digital system for system-on-chip implementation. Some of its notable distinctions include:
  • FlowLogic relies on the assumption that quantitative behavior at the FlowLogic level is perturbed minimally as it is translated to the physical implementation.
  • the embodiments described above provide a memory centric approach for a processing system design and architecture, as well as the FlowLogic language for designing, synthesizing, and placing and routing techniques for this unique processing system design.
  • Terms of the FlowLogic language have been analogized to some object oriented terms for ease of understanding.
  • a FlowGate may be thought of as a Function, Procedure or Task, while a FlowModule may be analogized to an object in object oriented programming.
  • a Signal may be referred to as a message or a packet. It should be appreciated that while these analogies are used for explanatory purposes, there are significant differences between the embodiments described herein and the corresponding analogies.
  • FIG. 10 is a flowchart diagram illustrating the method operations for configuring and programming a semiconductor circuit device having a multiple level array of memory storage cells in accordance with one embodiment of the invention.
  • the method initiates with operation 400 where the initial FlowLogic source code is provided.
  • the FlowLogic source code is parsed.
  • decision operation 404 it is determined whether any errors exist in the source code, e.g., syntax errors. Since FlowLogic supports a subset of C++ in one embodiment, it should be appreciated that this check will reveal any syntax issues. If an error does exist, the method returns to operation 400 and the error is corrected and the method resumes as described above.
  • the method advances to operation 406 where the FlowLogic source code is in a state where some of the code is in a C++ format.
  • the Flowlogic modules are instantiated through an elaboration process.
  • the source code having a description of a network is converted to code representing FlowLogic instances, i.e., a network of instances is provided. This results in the FlowLogic Instance source code as represented in operation 410 .
  • the FlowLogic Instances are compiled into corresponding FVMs.
  • the compiled FVMs are checked for compile errors in operation 414 . If there are compile errors found in operation 414 , then the method returns to operation 400 and repeats as described above. If there are no compile errors, then the compiled FVMs are made available in operation 416 .
  • the compiled FVMs are input into a simulator in operation 418 , wherein a functional simulation and an instruction level simulation are performed. It should be appreciated that the source code from operation 400 is used to provide the function level simulation, while the compiled FVMs are used to provide the instructional level simulation.
  • a mapper aggregates the FVMs to AFVMs and maps AFVMs to FLA (FlowLogicArray) Tiles.
  • the mapping of the AFVM is into a portion of the multiple level array of memory storage cells.
  • the multi-way access paths of the multiple level array are configured according to the multiple FVMs in operation 420 .
  • the portion of the multiple level array is programmed to function according to the multiple FVMs.
  • the method terminates in operation 422 where the FLA (FlowLogicArray) is defined as a chip in silicon.
  • the invention may employ various computer-implemented operations involving data stored in computer systems. These operations are those requiring physical manipulation of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. Further, the manipulations performed are often referred to in terms, such as producing, identifying, determining, or comparing.
  • the invention also relates to a device or an apparatus for performing these operations.
  • the apparatus may be specially constructed for the required purposes, or it may be a general purpose computer selectively activated or configured by a computer program stored in the computer.
  • various general purpose machines may be used with computer programs written in accordance with the teachings herein, or it may be more convenient to construct a more specialized apparatus to perform the required operations.
  • the invention can also be embodied as computer readable code on a computer readable medium.
  • the computer readable medium is any data storage device that can store data which can be thereafter be read by a computer system. Examples of the computer readable medium include hard drives, network attached storage (NAS), read-only memory, random-access memory, CD-ROMs, CD-Rs, CD-RWs, magnetic tapes, and other optical and non-optical data storage devices.
  • the computer readable medium can also be distributed over a network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Geometry (AREA)
  • Computing Systems (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Logic Circuits (AREA)
  • Design And Manufacture Of Integrated Circuits (AREA)
  • Devices For Executing Special Programs (AREA)
  • Semiconductor Memories (AREA)
  • Multi Processors (AREA)

Abstract

A method for specifying stateful, transaction-oriented systems is provided. The method initiates with designating a plurality of primitive FlowModules. The method includes defining at least one FlowGate within each of the plurality of FlowModules, wherein each FlowGate includes a non-interruptible sequence of procedure code, a single point of entry and is invoked by a named concurrent call. An Arc is designated from a calling FlowGate to a called FlowGate and a Signal is generated for each named invocation of the called FlowGate. A Channel is defined for carrying the Signal. Methods for synthesizing a semiconductor device and routing signals in the semiconductor device are provided.

Description

    CLAIM OF PRIORITY
  • The present application is a divisional application of U.S. application Ser. No. 11/426,882 filed Jun. 27, 2006, and claims priority under 35 U.S.C. §119(e) from U.S. Provisional Patent Application No. 60/694,538, filed Jun. 27, 2005, U.S. Provisional Patent Application No. 60/694,546, filed Jun. 27, 2005, and U.S. Provisional Patent Application No. 60/694,537, filed Jun. 27, 2005, all of which are incorporated by reference in their entirety for all purposes. The present application is related to U.S. Pat. No. 7,676,783 issued Mar. 9, 2010 (Atty Docket ARITP001) entitled APPARATUS FOR PERFORMING COMPUTATIONAL TRANSFORMATIONS AS APPLIED TO IN-MEMORY PROCESSING OF STATEFUL, TRANSACTION ORIENTED SYSTEMS, and U.S. Pat. No. 7,614,020 issued Nov. 3, 2009 (Atty Docket ARITP003) entitled STRUCTURALLY FIELD-CONFIGURABLE SEMI-CONDUCTOR ARRAY FOR IN-MEMORY PROCESSING OF STATEFUL, TRANSACTION-ORIENTED SYSTEMS, each of which are incorporated by reference in their entirety for all purposes.
  • BACKGROUND
  • System on a chip (SOC) implementation is predominantly based on design capture at the register-transfer level using design languages such as Verilog and VHDL, followed by logic synthesis of the captured design and placement and routing of the synthesized netlist in physical design. Current efforts to improve design productivity have aimed at design capture at a higher level of abstraction, via more algorithmic/system approaches such as C++, C, SystemC and System Verilog.
  • As process technology advances, physical design issues such as timing closure and power consumption management have dominated the design cycle time as much as design capture and verification. Methodology advances currently in development and under consideration for adoption using higher levels of abstraction in design capture do not address these physical design issues, and manufacturability issues. It is recognized in the semiconductor industry that with process technologies at 90 nm and below, physical design issues will have even more significant cost impacts in design cycle time and product quality.
  • CAD tools for placement and route of synthesized logic netlists have delivered limited success in addressing the physical design requirements of deep submicron process technologies. To take full advantage of deep submicron process technology, the semiconductor industry needs a design methodology and a supporting tool suite that can improve productivity through the entire design cycle, from design capture and verification through physical design, while guaranteeing product manufacturability at the same time. It is also well-known in the semiconductor industry that SOC implementations of stateful, transaction-oriented applications depend heavily on on-chip memory bandwidth and capacity for performance and power savings. Placement and routing of a large number of memory modules becomes another major bottleneck in SOC physical design.
  • Another important requirement for an advanced SOC design methodology for deep submicron process technology is to allow integration of on-chip memory with significant bandwidth and capacity without impacting product development schedule or product manufacturability. High level design capture, product manufacturability, and support for significant memory resources are also motivating factors in the development of processor-in-memory. Processor-in-memory architectures are driven by requirements to support advanced software programming concepts such as virtual memory, global memory, dynamic resource allocation, and dynamic load balancing. The hardware and software complexity and costs of these architectures are justified by the requirement to deliver good performance for a wide range of software applications. Due to these overheads, multiple processor-in-memory chips are required in any practical system to meet realistic performance and capacity requirements, as witnessed by the absence of any system product development incorporating a single processor-in-memory chip package.
  • There is thus an added requirement for cost effective SOC applications that resource management in processor-in-memory architectures be completely controllable by the designer through program structuring and annotations, and compile-time analysis. It is also important to eliminate all cost and performance overheads in software and hardware complexity attributed to the support of hierarchical memory systems. Based on these observations, there is a need in the semiconductor industry for a cost-effective methodology to implementing SOCs for stateful, transaction-oriented applications.
  • SUMMARY
  • Broadly speaking, the present invention fills these needs by providing a method and apparatus for performing in-memory computation for stateful, transaction-oriented applications. It should be appreciated that the present invention can be implemented in numerous ways, including as a process, an apparatus, a system, or a device. Several inventive embodiments of the present invention are described below.
  • In one embodiment, a method for specifying stateful, transaction-oriented systems is provided. The method initiates with designating a plurality of primitive FlowModules. The method includes defining at least one FlowGate within each of the plurality of FlowModules, wherein each FlowGate includes a non-interpretable sequence of procedure code, a single point of entry and is invoked by a named concurrent call. An Arc is designated from a calling FlowGate to a called FlowGate and a Signal is generated for each named invocation of the called FlowGate. A Channel is defined for carrying the Signal.
  • In another embodiment, a method for synthesizing a stateful, transaction-oriented system for flexible mapping to a structurally field-configurable semiconductor device having a multi-level array of storage elements, for in-memory processing is provided. The method initiates with mapping FlowLogic to a network of FlowVirtualMachines (FVM). A FlowModule is mapped into a corresponding FlowVirtualMachine (FVM) and one or more FVMs are integrated into an AggregateFVM (AFVM). One or more AFVMs are composed into a FlowTile, and Signals are routed between FlowModules.
  • In yet another embodiment, a method for routing FlowLogic Signals over a structurally configurable in-memory processing array is provided. The method initiates with configuring a pool of memory resource units into corresponding OutputBuffers, CommuteBuffers and ChannelMemories, the pool of memory units shared with a FlowLogicMachine. A producer-consumer relationship between the corresponding OutputBuffers and CommuteBuffers is configured and a producer-consumer relationship between the CommuteBuffers and VirtualChannels residing in the ChannelMemories is configured. Producer-consumer relationships between the OutputBuffers and VirtualChannels residing in said ChannelMemories are configured and producer-consumer relationships between the CommuteBuffers and neighboring CommuteBuffers are configured.
  • In still yet another embodiment, a method for debugging a stateful, transaction-oriented runtime system having a multi-level array of storage elements is provided. The method includes instructing the stateful transaction oriented system to pause and instructing the stateful transaction oriented system to single step until a given point. Information for selected FlowGate invocations is tracked and areas within the multi-level array of storage elements are queried for the debugging session.
  • Other aspects and advantages of the invention will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, illustrating by way of example the principles of the invention.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The present invention will be readily understood by the following detailed description in conjunction with the accompanying drawings, and like reference numerals designate like structural elements.
  • FIG. 1 is a high-level simplified schematic diagram of flow modules in accordance with one embodiment of the invention.
  • FIG. 2 is a simplified schematic diagram illustrating the data path of a compute element of a tile in accordance with one embodiment of the invention.
  • FIG. 3 depicts the notion of FlowTunnels in accordance with one embodiment of the invention.
  • FIG. 4 is a simplified schematic diagram illustrating a logical view for the execution of a FlowModule in accordance with one embodiment of the invention.
  • FIG. 5 is a simplified schematic diagram illustrating the ability to aggregate several flow modules into one aggregate structure in accordance with one embodiment of the invention.
  • FIG. 6 is a high-level schematic diagram illustrating a tile that supports a corresponding set of virtual processors in accordance with one embodiment of the invention.
  • FIG. 7 is a high-level simplified schematic illustrating an architectural view of a FlowLogicMachine in accordance with one embodiment of the invention.
  • FIG. 8 is a simplified schematic diagram illustrating the data flow within a flow logic machine in accordance with one embodiment of the invention.
  • FIG. 9 is a simplified schematic diagram illustrating a Tile having an adapter to interface with an external device to the Tile in accordance with one embodiment of the invention.
  • FIG. 10 is a flowchart diagram illustrating the method operations for configuring and programming a semiconductor circuit device having a multiple level array of memory storage cells in accordance with one embodiment of the invention.
  • DETAILED DESCRIPTION
  • An invention is described for a structurally reconfigurable intelligent memory device for efficient implementation of stateful, transaction-oriented systems in silicon. It will be obvious, however, to one skilled in the art, that the present invention may be practiced without some or all of these specific details. In other instances, well known process operations have not been described in detail in order not to unnecessarily obscure the present invention.
  • The embodiments of the present invention described below provide a method and apparatus enabling flexible design capture methodology which allows a designer to select the granularity at which a stateful, transaction-oriented application is captured. An efficient methodology to implement a stateful, transaction-oriented application on a platform economically superior with respect to design effort, implementation costs and manufacturability is further described below. The embodiments utilize an execution model that allows for efficient compiler optimization and resource allocation, efficient hardware implementation, and accurate performance analysis and prediction when a design is captured and analyzed. It should be appreciated that no significant uncertainty is introduced by design compilation, mapping into the physical platform, or resource conflicts during system operation. The resource requirements are specified explicitly when the design is captured, using annotations or compiler analysis. Allocation of hardware resources can be determined statically at compile time.
  • In another aspect of the invention a simple and effective chip architecture that uses a single level real memory organization to eliminate the costs of managing a caching hierarchy associated with virtual memory systems in applications development, compiler optimization, run-time system support, and hardware complexity is provided. As will be explained in more detail below, the embodiments described herein meet the tremendous demands of memory capacity and bandwidth in future generation SOCs with solutions that are economical in die area, product development cycle and power consumption. At the same time, the embodiments reap the cost, performance and power consumption benefits of advanced deep submicron fabrication processes with exceedingly high manufacturability and reliability.
  • FIG. 1 is a high-level simplified schematic diagram of FlowModules in accordance with one embodiment of the invention. FlowModules 100 a through 100 d represent objects in accordance with one embodiment of the invention. FlowModules 100 a through 100 d are mostly comprised of memory arrays in this embodiment. FlowModule 100 a includes FlowMethod 104, States 106, and FlowGate 112. Signals 102 are processed and commuted between FlowModules through FlowGates 112. Signals 102, which may be referred to as messages, are in packet format in one embodiment of the invention. The primary inputs and outputs into the FlowLogic architecture are also Signals. Arc 108 represents a channel through which data flows between FlowGates 112. In one embodiment, Arcs 108 represent queues and Signals 102 are transmitted through Arcs 108. FlowModules 100 represent objects, defining codes and data allocated to memory. FIG. 1 further illustrates FlowModule 100 a and FlowModule 100 b within hierarchical FlowModule 110. FlowModule 100 a and FlowModule 100 b are grouped within hierarchical FlowModule 110 for convenience in one embodiment. In other words, the grouping of FlowModule 100 a and FlowModule 100 b may be analogized to an alias. Arcs 108 may be characterized as a ForwardArc 108 a, a CallForwardArc 108 b or a TimedArc 108 c in one embodiment. The details for these types of Arcs are provided below. It should be appreciated that Arcs 108 are created in application-specific fashion. FlowGates 112 are invoked through an external Signal and are akin to a function call.
  • Still referring to FIG. 1, PrimitiveFlowModules, henceforth referred as FlowModules 100 a-d, are concurrent entities that include FlowGates 112, States 106, and FlowMethods 104. Arcs 108 emanate from a FlowGate and terminate at a FlowGate. An Arc can carry one or more Signals at a given time. A FlowGate is invoked by a Signal instance, i.e., a Signal instance is targeted to invoke a specific FlowGate. In one embodiment, a Signal instance is a stream of bytes that carries necessary arguments which may be a small message or a large packet or of any size in between. A Signal also may carry a priority-class attribute. Signals within a class (priority-class) are guaranteed to arrive in the order they were generated at the head of the Arc. It should be appreciated that FlowGate 112 does not have a state of its own. FlowGate 112 can modify the state of the FlowModule it resides in and the FlowGates may generate one or more Signals and thereby invoke one or more FlowGates concurrently. In one embodiment, FlowGate 112 may be thought of as an indivisible and un-interruptible sequence of procedural code that typically terminates after a short burst of execution. FlowLogic guarantees that one and only one FlowGate within a FlowModule is active at any time and a FlowGate once started is guaranteed to complete. FlowMethods, as used herein are sequential bodies of code, e.g., C style function calls that the FlowGates within a FlowModule may use to achieve their end goals. Hierarchical FlowModules comprise one or more FlowModules 100 a-d, and are largely used to facilitate FlowLogic code reuse and interface specification exchange. A TimedArc is a special case of an Arc, where the constituent Signals carry a Timer. The constituent Signals in a TimedArc will invoke corresponding FlowGate out-of-order as and when the Timer expires. In one embodiment, TimedArcs are specifically constrained to originate and terminate within the same FlowModule. A ForwardArc is another special case of an Arc 108 whose destination is implied by Signals carried by a CallForwardArc. It should be appreciated that the notion of threads or processors does not exist in the FlowLogic description. FlowLogic can be thought of as a set of interacting pipelines of Signal flows.
  • One skilled in the art will appreciate from FIG. 1 that the FlowLogic architecture can be used to describe an arbitrary transaction-oriented application using an arbitrary number of interconnected FlowLogic components. Isochronous systems can also be described with reasonable timing resolution. It should be noted that FlowLogic is not meant for traditional digital logic system design where cycle accuracy and deterministic behavior is paramount. Systems designed using FlowLogic are non-deterministic, but can have well-known end-to-end functional behavior independent of the delays in the Arc. Arcs are guaranteed not to drop Signals unless they are attributed specifically to do so. The quantitative or performance behavior of the system may change depending on the parameters of the Arcs, including delay (latency), capacity, priority and so forth.
  • The FlowLogic architecture allows flexible design space exploration of performance and quantitative behavior, followed by flexible mapping of the results into the said structurally field-configurable semiconductor device. The parameters related to Arcs 108, among others, are determined interactively during system simulations using FlowLogic. It may be noted that the performance behavior of such systems will only be as good as the traffic pattern assumptions made in the simulation. In one embodiment, FlowGates referred to as DynamicFlowGates can be dynamically loaded and linked at run-time. In one embodiment, DynamicFlowGates are limited to serving the purposes of run-time system diagnostics and debug. Thus, an overview of the FlowLogic system and language has been provided above and further details are provided with reference to the Figures referenced below.
  • FIG. 2 shows an alternative structural view to the FlowLogic system in accordance with one embodiment of the invention. FlowModules 100 a through 100 d are interconnected through a set of Arcs or Channels. These Arcs or Channels of FIG. 2 may be classified as Random Read Channels 116, Priority Class Channel 114, or Random Access Channel 118, in accordance with one embodiment of the invention. As mentioned above, the FlowModules are mainly composed of memory regions and Channels 114, 116, and 118 provide the wiring for communication between these memory regions. It should be appreciated that different types and capacity channels are inferred interactively from a FlowLogic description via annotations. For example, Signal types carry attributes that determine the range of priority-class, type and capacity of the Channel. A set of Arcs between two FlowModules map into one or more virtual Channels depending on the Signal types that the Arcs carry. A Channel can be thought of as a uni-directional memory element with FlowMethods for producer writes, consumer reads, and synchronization and flow control. In the simplest case, a Channel may be a first-in-first-out (FIFO) serial queue. In another embodiment, a Channel may be serial-write, random-read for the purposes of filtering and classification functions. In yet another embodiment, a Channel may comprise random-write and random-read ability to exchange semaphores.
  • FIG. 3 depicts the notion of FlowTunnels in accordance with one embodiment of the invention. A FlowTunnel 101 is a FlowLogic sub-design that bridges communications between two clusters of FlowLogic. While FlowLogic clusters are optimized for implementation on semiconductor devices with over-provisioned internal communication paths, FlowTunnels encapsulate relatively lower bandwidth communication paths such as serial interfaces between sub-systems. For example, in the preferred embodiment, one cluster corresponds to the portion of FlowLogic description that is implemented on a host processor. This is connected to another cluster which is implemented on an in-memory processing device communicating over a PCI Express link. Also FlowLogic Clusters implemented on separate semiconductor dies and or devices communicate with each other over relatively lower bandwidth FlowTunnels. The functionality of a FlowTunnel comprises of buffering, re-synchronization, coalescing and priority based scheduling.
  • FIG. 4 is a simplified schematic diagram illustrating a logical view for the execution of a FlowModule in accordance with one embodiment of the invention. It should be appreciated that the embodiments described herein take an object and translate that into a FlowModule, which is then further translated into a FlowVirtualMachine (FVM). FlowVirtualMachine 100 represents the execution model of a FlowModule. FVM 100 includes FlowGateIndex 120. In one embodiment, a Signal will hit a FlowModule and through the FlowGateIndex it is determined which FlowGate to execute. In one embodiment, the data within the Signal itself will identify the FlowGate to pick up. StackMemory 122, CodeMemory 124, StateMemory 126, OutputBuffer 128, and ChannelMemory 130 are further included in FVM 100. OutputBuffer 128 is a relatively small memory area for temporarily staging outgoing Signals. ChannelMemory 130 is on the input side for receiving messages into FVM 100. It should be appreciated that each portion of the memories within FVM 100 is shared or aggregated by FlowGates with the exception of CodeMemory 124. Thus, when a Signal hits a FlowGate, as mentioned above, there is a pointer to invoke the FlowGate code. It should be appreciated that FIG. 3 depicts a model that directly determines the characteristics required for mapping to a field-configurable semiconductor device. For the purposes of describing the preferred embodiment of this invention, it is sufficient to talk about the architectural aspects of FVM rather than the details of execution. The variable components of a FVM are the memory partitions and their contents, by varying which any FlowModule can be mapped and executed on it.
  • It should be noted that the sizes of the logical memory partitions in an FVM are arbitrary and the partitions have physically independent access paths. The code related to FlowGates and FlowMethods is compiled into relocatable machine code which in-turn determines the logical size of the corresponding FVM CodeMemory. The FlowGateIndex contains a jump table indexed on unique FlowGate identifier along with the pointer to the FlowGate code, among other context data for proper FlowGate execution. The StackMemory is used for storing intermediate states as required during the FlowGate execution. There are no register files in the FVM. The working of the FVM is analogous to that of a stack machine. The Stack is always empty before a FlowGate starts since the FlowGate by itself does not have a persistent state, and the FlowGate is not allowed to suspend.
  • The size or the depth of the Stack is determined at compile-time by the FlowLogic compiler. As may be evident, FlowLogic programming style does not support nested calls and recursive function calls whose depths are not predictable at compile-time. Furthermore, there is no dynamic allocation or garbage collection in FlowLogic because memory resource allocations are fixed at compile-time. Other than temporary variables whose life times span the FlowGate call, State variables are all pre-allocated at compile-time. The size of the StateMemory 126 for a FVM is well known at the compile time. OutputBuffer 128 and ChannelMemory 130 are managed by the run-time system and are visible to the system designer only via annotation in one embodiment. OutputBuffer 128 is a small memory area for temporarily staging outgoing Signals. ChannelMemory 130, on the other hand, hosts the Channels and is as large as is required by the corresponding FVM. It is useful to point out at this time that although these memories have different access data paths, the memories all use the same resource types in the structurally configurable in-memory processing array. In fact, memories are the only resources directly allocated in the array, with other necessary logic, including processing elements, being fixed to such memory resources.
  • FIG. 5 is a simplified schematic diagram illustrating the ability to aggregate several FlowModules into one aggregate structure in accordance with one embodiment of the invention. Here, multiple FVMs are aggregated and placed into what is referred to as a FlowTile. Aggregate FVM 132 includes a similar structural representation as an individual FVM, i.e., FlowGateIndex 120 a, StackMemory 122 a, CodeMemory 124 a, StateMemory 126 a, OutputBuffer 128 a, and ChannelMemory 130 a. Module pointers (MP) x, y, and z are pointers pointing to corresponding StateMemory areas of the aggregated FlowModules. It should be appreciated that FlowGateIndex 120 a will now index into the CodeMemory, as well as the StateMemory, since multiple FlowModules have been aggregated together. It should be appreciated that the ability to aggregate several concurrent FlowModules into one aggregate is a distinguishing factor behind the FVM architecture. The StackMemory size is the maximum of the StackMemory sizes of the individual FVMs. CodeMemory 124 a is the sum of the code memories of the aggregated FVMs. However, in one embodiment, CodeMemory 124 a may be shared among different FlowModules, resulting in a total size that is smaller than the sum. In the particular case, where multiple FlowModules of the same type are replicated for load sharing, CodeMemory 124 a may even contain a single code copy shared among multiple instances. OutputBuffer 128 a and the ChannelMemory 130 a blocks are managed by the run-time system, in a fashion largely transparent to the application.
  • FIG. 6 is a high-level schematic diagram illustrating a FlowTile that supports a corresponding set of virtual processors in accordance with one embodiment of the invention. In this representation, the FlowTile is composed of aggregate FVMs 132 a, 132 b, and 132 c. Run-time system 134 functions to determine which Signal is associated with which FlowGate. Thus, run time system 134, which may be referred to as a kernel, will coordinate the flow of Signals within the FlowTile. Commute element 136 functions to move Signals into and out of the FlowTile. In one embodiment, Commute element 136 may be thought of as an interface or a router for the various Signals being transmitted. Of course, the router functionality is illustrated here as being internal to the system, however, the router functionality may alternatively be external to the FlowTile in another embodiment. As shown in FIG. 6, multiple AFVMs are mapped to a FlowTile that supports a corresponding set of virtual processors. A FlowTile is a physical entity that has a certain total number of memory resource units. The sum of the resources required by the AFVMs cannot exceed this total. Within this constraint, memory units can be mapped flexibly to serve the functionality of the constituent FlowModules. A FlowTile has a corresponding Runtime System, which coordinates the flow of Signals within the FlowTile. As mentioned above, the Commute element is responsible for moving Signals out of the OutputBuffer and into the corresponding ChannelMemory.
  • FIG. 7 is a high-level simplified schematic illustrating an architectural view of a FlowLogicMachine in accordance with one embodiment of the invention. Here, each FlowTile 140 a through 140 n is connected to in-memory Signal router 142 through corresponding commute elements 136 a through 136 n. It should be appreciated that in-memory Signal router 142 performs routing functionality within the chip that the FlowLogicMachine is designed for. Within each FlowTile 140 a through 140 n, the coordination of Signals is performed by run-time systems 134 a through 134 n, respectively. FlowTiles 140 a-n are connected to the application independent in-memory router 142 for routing Signals within the FlowLogicMachine. Memory router 142 includes Commute elements 136 a-n associated with every FlowTile. In one embodiment, the in-memory router 142 is sufficiently over-provisioned to ensure that Signals flow out of the OutputBuffer and in-memory router 142 without causing blockages, and with minimal transit time. If there is a blockage, the blockage is constrained to the ChannelMemory, where it manifests as a system characteristic, which can be appropriately alleviated at the level of the FlowLogic design representation. As mentioned above the router functionality may also be performed externally. In one embodiment, the run-time system ensures that Signals are created only if the receiving Channel has sufficient credits, ensuring that the worse case behavior of deadlock, over-run etc. does not occur. The overhead of credit based flow control management is tunable at the FlowLogic design representation level by providing adequate Channel sizing attributes. Commute elements 136 a-n further breaks up Signals into small flow control digits (Flits) ensuring that end-to-end latency is not sensitive to Signal sizes.
  • FIG. 8 is a simplified schematic diagram illustrating the data flow within a FlowLogicMachine in accordance with one embodiment of the invention. The OutputBuffer for a FlowTile originating a Signal will forward that Signal to the Commute element, where a larger message may be broken up into smaller chunks and passed through intermediate stages. These smaller chunks are then accumulated in ChannelMemory, which is in a FlowTile consuming the data in one embodiment. Signals from the OutputBuffer are independently read out by the Commute element and segmented into Flits, which are the flow control digits or primitives.
  • FIG. 9 is a simplified schematic diagram illustrating a FlowTile having an Adapter to interface with an external device to the FlowTile. FlowTile 140 is in communication with Adapter 144. Adapter 144 can provide an interface for chip-to-chip communication in accordance with one embodiment. For example, Adapter 144 may provide a packet interface in order to transfer packets between devices. It should be appreciated that the Adapter can be designed so that the interface is application-specific. In one embodiment, some of the FlowTiles, e.g., on the periphery of the array, are configured to interface with the external world, e.g., other chips. The interface for the external world is also a Signal based interface that is accomplished through Adapter 144 as shown in FIG. 9.
  • The FlowLogicMachine can itself be thought of as an array of structurally configurable memory units that implements a plurality of FlowTiles, where the computational logic is fixed and distributed. As a further analogy, the FlowLogic language described herein may be thought of as the JAVA language, while the FlowLogicMachine may be analogized to the JAVA Virtual machine, since the FlowLogic Language has some attributes of object oriented programming languages. For one skilled in the art, it should be appreciated that much of the resources in question are memory units in one form or another, i.e., code, state, stack, channels, and buffer. Motivated by the above observation, the FlowLogicMachine is designed to provide the ability to configure these memory units, also referred to as memory resources, as required by a particular application and the FlowLogic representation allows the flexibility of re-casting a system description in flexible ways to achieve the targeted capacity, performance, and functionality.
  • As mentioned above, FIG. 7 shows the architectural view of a FlowLogicMachine. It comprises a plurality of FlowTiles. The FlowTiles are connected to the application independent in-memory router for routing Signals within the FlowLogicMachine. The said router comprises Commute elements associated with every FlowTile. In the preferred embodiment, the router is sufficiently over-provisioned to ensure that Signals flow out of the OutputBuffer and through the router itself without causing blocks, with minimal transit time. If there is a blockage, it is constrained to the Channel Memory, where it manifests as a system characteristic which can be appropriately analyzed and managed at the FlowLogic level of the design representation. The run-time system ensures that Signals are created only if the receiving Channel has sufficient credits, ensuring that the worse case behavior of deadlock, over-run etc. does not occur. The overhead of credit based flow control management is tunable at the FlowLogic design representation level by providing adequate sizing Channel attribute. The Commute element further breaks up Signals into small flow control digits (Flits) ensuring that end-to-end latency is not sensitive to Signal size and the number of hops.
  • Some of the FlowTiles, say on the periphery of the array, are configured to interface with the external world. The said interface is also a Signal based interface that is accomplished through Adapter as shown in FIG. 9.
  • The FlowLogicMachine can itself be thought of as an array of structurally configurable memory units that implements a plurality of FlowTiles where the computational logic is fixed and distributed. For one skilled in the art, it is easy to see that much of the said resources in question are memory units in one form or another: code, state, stack, channels, and buffer. Motivated by the above observation, the FlowLogicMachine is designed to provide the ability to configure the memory units as required by a particular application and the FlowLogic representation allows the flexibility of re-casting a system description in flexible ways to achieve the targeted capacity, performance and functionality.
  • The FlowLogicMachine has novel features that help in system diagnosis among others. FlowGates are by-design atomic and always go to completion, once fired. There is no notion of run-time instruction-level single-stepping in the context of FlowLogicMachine. Instead, it can be stepped on FlowGate boundaries. FlowTiles can be instructed to execute one FlowGate at a time. An external debug controller can observe the StateMemory, ChannelMemory and other partitions of the FVM by making explicit system read calls when the FlowLogicMachine is paused between steps of FlowGate execution. The debug controller may even launch DynamicFlowGates to achieve diagnostic goals. The FlowLogicMachine has built-in FlowGates called SystemFlowGates for read, write and configuration purposes. The SystemFlowGates come into existence on device boot, independent of applications. These SystemFlowGates are also used for booting application-specific FVMs.
  • The embodiments described herein also support runtime debugging of the FlowLogicMachine. The FlowLogic runtime system can be controlled from an outside machine (host) through sending and receiving of signals with specific debugging payloads. The host sends debugging commands to the runtime system in signals; it also receives data and state information back from the runtime system in signals.
  • The following debugging techniques are supported by the FlowLogicMachine:
      • The runtime system can be instructed to pause (break) execution on a given condition. These conditions may include invocation of a specific FlowGate, the contents of any input signal, any expression on FlowGate invocations (i.e. the nth invocation of a given FlowGate), or any other internal state of the runtime system. Upon halting execution, the runtime system will notify the host by sending a signal indicating that execution has stopped. The host can then control the debugging process by sending further instructions encapsulated in signals.
      • The runtime system can be instructed to resume execution (step) until a given condition. This is analogous to single-stepping in a compiled code environment. Several variants of this behavior are supported, such as “step to the next FlowGate invocation”, “step to the nth invocation of a given FlowGate”, or “step until a FlowGate receives a signal with a given content”.
      • The runtime system can be instructed to capture information (trace) about selected or all FlowGate invocations and communicate this information to the host. The information communicated is essentially a trace of the firings of FlowGates, their input signals, and their output signals.
      • The runtime system can be instructed to query certain memory areas in the tile and return data (dump) to the host system. The information communicated can be the current positions of the context pointers (such as MP), the contents of any memory or a sub-range of that memory, or the current utilization of VirtualChannels.
      • To support diagnostics and debugging, executable FlowGate code can be sent from the host to the runtime system of a given FlowTile. The runtime system will load this code into its CodeMemory and execute it to support the debugging session.
  • One skilled in the art may note that FlowLogic is not a general method for describing any digital system for system-on-chip implementation. Some of its notable distinctions include:
      • 1. It raised the level of abstraction for design capture, verification and analysis. To allow for implementation flexibility, it is not required to preserve cycle accuracy among different levels of design representation.
      • 2. At a higher level of design capture, it is not deemed necessary to support arbitrary combinational logic oriented systems efficiently
      • 3. The performance of the system designed using FlowLogic depends on the mix of workload used in simulation.
      • 4. Functionality and performance of FlowLogic designs are not efficiently implemented on systems that primarily span over bandwidth constrained networks. FlowLogic is optimized for implementation on bandwidth over-provisioned on-chip intelligent memory with Flit based communications.
  • FlowLogic relies on the assumption that quantitative behavior at the FlowLogic level is perturbed minimally as it is translated to the physical implementation.
  • The embodiments described above provide a memory centric approach for a processing system design and architecture, as well as the FlowLogic language for designing, synthesizing, and placing and routing techniques for this unique processing system design. Terms of the FlowLogic language have been analogized to some object oriented terms for ease of understanding. For example, a FlowGate may be thought of as a Function, Procedure or Task, while a FlowModule may be analogized to an object in object oriented programming. A Signal may be referred to as a message or a packet. It should be appreciated that while these analogies are used for explanatory purposes, there are significant differences between the embodiments described herein and the corresponding analogies.
  • Traditional processors incorporate the notion of virtual memories to push physical memory away from the processing core. To do so, they introduce accumulators, registers and caching hierarchies. The embodiments described above embrace the incorporation of processing core(s) directly within the physical memory. Furthermore, the data paths in the above-described embodiments are significantly different than the data paths within the traditional processor architecture.
  • FIG. 10 is a flowchart diagram illustrating the method operations for configuring and programming a semiconductor circuit device having a multiple level array of memory storage cells in accordance with one embodiment of the invention. The method initiates with operation 400 where the initial FlowLogic source code is provided. In operation 402, the FlowLogic source code is parsed. In decision operation 404, it is determined whether any errors exist in the source code, e.g., syntax errors. Since FlowLogic supports a subset of C++ in one embodiment, it should be appreciated that this check will reveal any syntax issues. If an error does exist, the method returns to operation 400 and the error is corrected and the method resumes as described above. If there is no error detected, then the method advances to operation 406 where the FlowLogic source code is in a state where some of the code is in a C++ format. In operation 408, the Flowlogic modules are instantiated through an elaboration process. Here, the source code having a description of a network is converted to code representing FlowLogic instances, i.e., a network of instances is provided. This results in the FlowLogic Instance source code as represented in operation 410.
  • Still referring to FIG. 10, in operation 412, the FlowLogic Instances are compiled into corresponding FVMs. The compiled FVMs are checked for compile errors in operation 414. If there are compile errors found in operation 414, then the method returns to operation 400 and repeats as described above. If there are no compile errors, then the compiled FVMs are made available in operation 416. The compiled FVMs are input into a simulator in operation 418, wherein a functional simulation and an instruction level simulation are performed. It should be appreciated that the source code from operation 400 is used to provide the function level simulation, while the compiled FVMs are used to provide the instructional level simulation. In operation 420, a mapper aggregates the FVMs to AFVMs and maps AFVMs to FLA (FlowLogicArray) Tiles. Here, the mapping of the AFVM is into a portion of the multiple level array of memory storage cells. Additionally, the multi-way access paths of the multiple level array are configured according to the multiple FVMs in operation 420. Thereafter, the portion of the multiple level array is programmed to function according to the multiple FVMs. The method terminates in operation 422 where the FLA (FlowLogicArray) is defined as a chip in silicon.
  • The invention has been described herein in terms of several exemplary embodiments. Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention. The embodiments and preferred features described above should be considered exemplary, with the invention being defined by the appended claims.
  • With the above embodiments in mind, it should be understood that the invention may employ various computer-implemented operations involving data stored in computer systems. These operations are those requiring physical manipulation of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. Further, the manipulations performed are often referred to in terms, such as producing, identifying, determining, or comparing.
  • Any of the operations described herein that form part of the invention are useful machine operations. The invention also relates to a device or an apparatus for performing these operations. The apparatus may be specially constructed for the required purposes, or it may be a general purpose computer selectively activated or configured by a computer program stored in the computer. In particular, various general purpose machines may be used with computer programs written in accordance with the teachings herein, or it may be more convenient to construct a more specialized apparatus to perform the required operations.
  • The invention can also be embodied as computer readable code on a computer readable medium. The computer readable medium is any data storage device that can store data which can be thereafter be read by a computer system. Examples of the computer readable medium include hard drives, network attached storage (NAS), read-only memory, random-access memory, CD-ROMs, CD-Rs, CD-RWs, magnetic tapes, and other optical and non-optical data storage devices. The computer readable medium can also be distributed over a network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.
  • Although the foregoing invention has been described in some detail for purposes of clarity of understanding, it will be apparent that certain changes and modifications may be practiced within the scope of the appended claims. Accordingly, the present embodiments are to be considered as illustrative and not restrictive, and the invention is not to be limited to the details given herein, but may be modified within the scope and equivalents of the appended claims. In the claims, elements and/or steps do not imply any particular order of operation, unless explicitly stated in the claims.

Claims (24)

1. A method for synthesizing a stateful, transaction-oriented system for flexible mapping to a structurally field-configurable semiconductor device having a multi-level array of storage elements, for in-memory processing, comprising method operations of:
mapping FlowLogic to a network of FlowVirtualMachines(FVM);
mapping a FlowModule into a corresponding FlowVirtualMachine (FVM);
integrating one or more FVMs into an AggregateFVM (AFVM);
composing one or more AFVMs into a FlowTile, and
routing Signals between FlowModules.
2. The method of claim 1, wherein the FVM is an array of similar memory unit resources configured into partitions, the partitions accessible via a plurality of independent access paths.
3. The method of claim 1 wherein the partitions define a FlowGateIndex, a StackMemory space, a CodeMemory space, a StateMemory space, an OutputBuffer space and a ChannelMemory space.
4. The method of claim 2, further comprising:
relocating the partitions; and
repeating the method operations of mapping FlowLogic to a network of FlowVirtualMachines(FVM);
mapping a FlowModule into a corresponding FlowVirtualMachine (FVM);
integrating one or more FVMs into an AggregateFVM (AFVM);
composing one or more AFVMs into a FlowTile, and
routing Signals between FlowModules.
5. The method of claim 1 wherein the AFVM is derived from a composition of FVMs by one of linearly aggregating, merging or sharing memory unit resources of the composition of FVMs.
6. The method of claim 1, wherein the FlowTile is derived from a composition of AFVMs by one of linearly aggregating, merging or sharing of the memory unit resources of the composition of AFVMs.
7. The method of claim 1, wherein the Flowtile provides scheduling functionality through run-time flow control, reception of Signals and invoking of appropriate FlowGates.
8. The method of claim 3, wherein the FlowTile enables signals to be commuted out of the OutputBuffer space and into the ChannelMemory space.
9. The method of claim 2, wherein the FVM is without memory or caching hierarchies, and wherein all elements in the partitions are accessible in a same access time, the method further comprising:
allocating and defining initial contents for all memories at compile time.
10. The method of claim 1, further comprising:
designating SystemFlowGates that are application independent, built-in and available on power-on boot;
providing access to the storage elements for read, write and configuration operations; and
providing booting application specific FVMs.
11. The method of claim 1, further comprising:
splitting the Signals into two portions, a first portion defining header information and a second portion defining a payload, the first portion residing in a different part of the memory from the second portion.
12. A method for routing FlowLogic Signals over a structurally configurable in-memory processing array, the method comprising:
configuring a pool of memory resource units into corresponding OutputBuffers, CommuteBuffers and ChannelMemories, the pool of memory units shared with a FlowLogicMachine;
configuring a producer-consumer relationship between the corresponding OutputBuffers and CommuteBuffers,
configuring a producer-consumer relationship between the CommuteBuffers and VirtualChannels residing in the ChannelMemories;
configuring producer-consumer relationships between the OutputBuffers and VirtualChannels residing in said ChannelMemories;
configuring producer-consumer relationships between the CommuteBuffers and neighbouring CommuteBuffers.
13. The method of claim 12 wherein configuring a producer-consumer relationship between the corresponding OutputBuffers and CommuteBuffers includes,
enabling simultaneous access of the memory resource units through independent ports, asynchronous clocks and physical addressing; and
segmenting signals into small fixed size entities (Flits).
14. The method of claim 12 wherein configuring producer-consumer relationship between the CommuteBuffers and VirtualChannels residing in the ChannelMemories includes,
enabling simultaneous access of the memory resource units through independent ports, asynchronous clocks and physical addressing;
reassembling the small fixed size entities in the VirtualChannels;
segregating small fixed size entities arriving simultaneously for different signals from different sources prior to the reassembling.
15. The method of claim 14, wherein physically addressed writes into corresponding memory units achieve the reassembling and the segregating.
16. The method of claim 12 wherein configuring a producer-consumer relationship between the OutputBuffers and VirtualChannels includes,
enabling simultaneous access of the memory resource units through independent ports, asynchronous clocks and physical addressing;
reassembling Flits into Signals in the VirtualChannels; and
segregating Flits arriving simultaneously for different Signals from different sources prior to reassembly, wherein the reassembling and the segregating are achieved through physically addressed writes into corresponding memory units.
17. The method of claim 12, wherein the method operation of configuring producer-consumer relationships between the CommuteBuffers and neighbouring CommuteBuffers includes,
enabling simultaneous access of the memory resource units through independent ports, asynchronous clocks and physical addressing; and
switching an input Flit from a neighbor to a corresponding CommuteBuffer.
18. The method of claim 12, wherein the pool of memory resource units are single ported memories with time division access.
19. The method of claim 12, wherein the pool of memory resource units are enabled for synchronous access using a global clock.
20. A method for debugging a stateful, transaction-oriented runtime system having a multi-level array of storage elements, comprising method operations of:
instructing the stateful transaction oriented system to pause;
instructing the stateful transaction oriented system to single step until a given point;
tracking information for selected FlowGate invocations; and
querying contents of a portion within the multi-level array of storage elements.
21. The method of claim 20, wherein the method operation of instructing the stateful transaction oriented system to pause includes,
transmitting a signal to a host system indicating that the system has paused; and
controlling the debugging process through the host system by sending further instructions encapsulated in signals.
22. The method of claim 20, wherein the method operation of tracking information for selected FlowGate invocations includes,
tracing of firings of FlowGates, including FlowGate input signals, and FlowGate output signals.
23. The method of claim 20, wherein the method operation of querying contents of the portion within the multi-level array of storage elements includes,
communicating information to a host system, the information including current position of context pointers, contents of a portion of the multi-level array of storage elements or utilization of VirtualChannels.
24. The method of claim 20, further comprising;
sending executable FlowGate code from a host to the runtime system of a given tile;
loading the FlowGate code into the multi-level storage array; and
executing the FlowGate code.
US12/906,967 2005-06-27 2010-10-18 Method for Specifying Stateful, Transaction-Oriented Systems for Flexible Mapping to Structurally Configurable In-Memory Processing Semiconductor Device Abandoned US20110035722A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US12/906,967 US20110035722A1 (en) 2005-06-27 2010-10-18 Method for Specifying Stateful, Transaction-Oriented Systems for Flexible Mapping to Structurally Configurable In-Memory Processing Semiconductor Device

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US69453705P 2005-06-27 2005-06-27
US69453805P 2005-06-27 2005-06-27
US69454605P 2005-06-27 2005-06-27
US11/426,882 US7849441B2 (en) 2005-06-27 2006-06-27 Method for specifying stateful, transaction-oriented systems for flexible mapping to structurally configurable, in-memory processing semiconductor device
US12/906,967 US20110035722A1 (en) 2005-06-27 2010-10-18 Method for Specifying Stateful, Transaction-Oriented Systems for Flexible Mapping to Structurally Configurable In-Memory Processing Semiconductor Device

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US11/426,882 Division US7849441B2 (en) 2005-06-27 2006-06-27 Method for specifying stateful, transaction-oriented systems for flexible mapping to structurally configurable, in-memory processing semiconductor device

Publications (1)

Publication Number Publication Date
US20110035722A1 true US20110035722A1 (en) 2011-02-10

Family

ID=37596000

Family Applications (5)

Application Number Title Priority Date Filing Date
US11/426,887 Expired - Fee Related US7676783B2 (en) 2005-06-27 2006-06-27 Apparatus for performing computational transformations as applied to in-memory processing of stateful, transaction oriented systems
US11/426,880 Expired - Fee Related US7614020B2 (en) 2005-06-27 2006-06-27 Structurally field-configurable semiconductor array for in-memory processing of stateful, transaction-oriented systems
US11/426,882 Expired - Fee Related US7849441B2 (en) 2005-06-27 2006-06-27 Method for specifying stateful, transaction-oriented systems for flexible mapping to structurally configurable, in-memory processing semiconductor device
US12/561,460 Abandoned US20100008155A1 (en) 2005-06-27 2009-09-17 Structurally field-configurable semiconductor array for in-memory processing of stateful, transaction-oriented systems
US12/906,967 Abandoned US20110035722A1 (en) 2005-06-27 2010-10-18 Method for Specifying Stateful, Transaction-Oriented Systems for Flexible Mapping to Structurally Configurable In-Memory Processing Semiconductor Device

Family Applications Before (4)

Application Number Title Priority Date Filing Date
US11/426,887 Expired - Fee Related US7676783B2 (en) 2005-06-27 2006-06-27 Apparatus for performing computational transformations as applied to in-memory processing of stateful, transaction oriented systems
US11/426,880 Expired - Fee Related US7614020B2 (en) 2005-06-27 2006-06-27 Structurally field-configurable semiconductor array for in-memory processing of stateful, transaction-oriented systems
US11/426,882 Expired - Fee Related US7849441B2 (en) 2005-06-27 2006-06-27 Method for specifying stateful, transaction-oriented systems for flexible mapping to structurally configurable, in-memory processing semiconductor device
US12/561,460 Abandoned US20100008155A1 (en) 2005-06-27 2009-09-17 Structurally field-configurable semiconductor array for in-memory processing of stateful, transaction-oriented systems

Country Status (4)

Country Link
US (5) US7676783B2 (en)
EP (1) EP1899877A4 (en)
JP (1) JP2009505171A (en)
WO (1) WO2007002717A2 (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7464229B1 (en) * 2005-01-07 2008-12-09 Sun Microsystems, Inc. Serial-write, random-access read, memory
US7386823B2 (en) * 2005-07-20 2008-06-10 Springsoft, Inc. Rule-based schematic diagram generator
US20080244476A1 (en) * 2007-04-02 2008-10-02 Athena Design Systems, Inc. System and method for simultaneous optimization of multiple scenarios in an integrated circuit design
US9081901B2 (en) * 2007-10-31 2015-07-14 Raytheon Company Means of control for reconfigurable computers
FR2927438B1 (en) * 2008-02-08 2010-03-05 Commissariat Energie Atomique METHOD FOR PRECHARGING IN A MEMORY HIERARCHY CONFIGURATIONS OF A RECONFIGURABLE HETEROGENETIC INFORMATION PROCESSING SYSTEM
US8108574B2 (en) * 2008-10-08 2012-01-31 Lsi Corporation Apparatus and methods for translation of data formats between multiple interface types
US8589851B2 (en) * 2009-12-15 2013-11-19 Memoir Systems, Inc. Intelligent memory system compiler
US8484415B2 (en) * 2010-07-19 2013-07-09 Taejin Info Tech Co., Ltd. Hybrid storage system for a multi-level raid architecture
US8875079B2 (en) * 2011-09-29 2014-10-28 Lsi Corporation System and method of automated design augmentation for efficient hierarchical implementation
CN103186359B (en) * 2011-12-30 2018-08-28 南京中兴软件有限责任公司 Hardware abstraction data structure, data processing method and system
US9009703B2 (en) 2012-05-10 2015-04-14 International Business Machines Corporation Sharing reconfigurable computing devices between workloads
UA111169C2 (en) * 2013-03-15 2016-04-11 Анатолій Анатолійович Новіков METHOD OF OPERATION OF NP-PROCESSOR
JP5944358B2 (en) * 2013-09-10 2016-07-05 株式会社東芝 Semiconductor integrated circuit verification apparatus, semiconductor integrated circuit verification method, and program
CN106649897B (en) * 2015-10-28 2019-11-15 北京华大九天软件有限公司 One seed units array splices preprocess method
JP2019095952A (en) * 2017-11-21 2019-06-20 ソニーセミコンダクタソリューションズ株式会社 Processor, information processing device and processing method
KR20200085515A (en) * 2019-01-07 2020-07-15 에스케이하이닉스 주식회사 Data Storage Device, Operation Method Thereof, and Controller Therefor

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020085578A1 (en) * 2000-12-15 2002-07-04 Dell Martin S. Three-stage switch fabric with buffered crossbar devices
US20030046396A1 (en) * 2000-03-03 2003-03-06 Richter Roger K. Systems and methods for managing resource utilization in information management environments
US6598086B1 (en) * 1999-11-09 2003-07-22 International Business Machines Corporation Method and system for controlling information flow in a high frequency digital system from a producer to a buffering consumer via an intermediate buffer
US6654830B1 (en) * 1999-03-25 2003-11-25 Dell Products L.P. Method and system for managing data migration for a storage system
US20040085955A1 (en) * 2002-10-31 2004-05-06 Brocade Communications Systems, Inc. Method and apparatus for encryption of data on storage units using devices inside a storage area network fabric
US20050018609A1 (en) * 1999-05-21 2005-01-27 Avici Systems, Inc. Fabric router with flit caching
US6993634B2 (en) * 2002-04-29 2006-01-31 Intel Corporation Active tracking and retrieval of shared memory resource information
US20060161419A1 (en) * 2005-01-20 2006-07-20 Russ Herrell External emulation hardware
US20060271327A1 (en) * 2005-05-31 2006-11-30 David Haggerty Systems and methods for managing multi-device test sessions

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5734921A (en) * 1990-11-13 1998-03-31 International Business Machines Corporation Advanced parallel array processor computer package
JPH09197011A (en) * 1996-01-18 1997-07-31 Matsushita Electric Ind Co Ltd Method for mapping field programmable gate array
US6567837B1 (en) * 1997-01-29 2003-05-20 Iq Systems Object oriented processor arrays
US5857097A (en) * 1997-03-10 1999-01-05 Digital Equipment Corporation Method for identifying reasons for dynamic stall cycles during the execution of a program
TW360822B (en) * 1997-03-31 1999-06-11 Ibm Method of making a memory fault-tolerant using a variable size redundancy replacement configuration
US7036106B1 (en) * 2000-02-17 2006-04-25 Tensilica, Inc. Automated processor generation system for designing a configurable processor and method for the same
US20040022094A1 (en) * 2002-02-25 2004-02-05 Sivakumar Radhakrishnan Cache usage for concurrent multiple streams
US7653912B2 (en) 2003-05-30 2010-01-26 Steven Frank Virtual processor methods and apparatus with unified event notification and consumer-producer memory operations
US6845059B1 (en) * 2003-06-26 2005-01-18 International Business Machines Corporation High performance gain cell architecture
US20050097146A1 (en) * 2003-08-21 2005-05-05 Konstantinou Alexander V. Methods and systems for autonomously managing a network
US7023056B2 (en) * 2003-11-26 2006-04-04 Taiwan Semiconductor Manufacturing Company, Ltd. Memory cell structure
TWI283467B (en) * 2003-12-31 2007-07-01 Advanced Semiconductor Eng Multi-chip package structure
US7426668B2 (en) * 2004-11-18 2008-09-16 Nilanjan Mukherjee Performing memory built-in-self-test (MBIST)
US7183819B2 (en) * 2004-12-30 2007-02-27 Lucent Technologies Inc. Method and circuit configuration for synchronous resetting of a multiple clock domain circuit
CA2540474A1 (en) * 2005-04-01 2006-10-01 Uti Limited Partnership Cytometer

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6654830B1 (en) * 1999-03-25 2003-11-25 Dell Products L.P. Method and system for managing data migration for a storage system
US20050018609A1 (en) * 1999-05-21 2005-01-27 Avici Systems, Inc. Fabric router with flit caching
US6598086B1 (en) * 1999-11-09 2003-07-22 International Business Machines Corporation Method and system for controlling information flow in a high frequency digital system from a producer to a buffering consumer via an intermediate buffer
US20030046396A1 (en) * 2000-03-03 2003-03-06 Richter Roger K. Systems and methods for managing resource utilization in information management environments
US20020085578A1 (en) * 2000-12-15 2002-07-04 Dell Martin S. Three-stage switch fabric with buffered crossbar devices
US6993634B2 (en) * 2002-04-29 2006-01-31 Intel Corporation Active tracking and retrieval of shared memory resource information
US20040085955A1 (en) * 2002-10-31 2004-05-06 Brocade Communications Systems, Inc. Method and apparatus for encryption of data on storage units using devices inside a storage area network fabric
US20060161419A1 (en) * 2005-01-20 2006-07-20 Russ Herrell External emulation hardware
US20060271327A1 (en) * 2005-05-31 2006-11-30 David Haggerty Systems and methods for managing multi-device test sessions

Also Published As

Publication number Publication date
JP2009505171A (en) 2009-02-05
US7676783B2 (en) 2010-03-09
EP1899877A2 (en) 2008-03-19
US20100008155A1 (en) 2010-01-14
US7614020B2 (en) 2009-11-03
EP1899877A4 (en) 2011-12-28
WO2007002717A3 (en) 2009-04-16
WO2007002717A2 (en) 2007-01-04
US20060294490A1 (en) 2006-12-28
US7849441B2 (en) 2010-12-07
US20070150854A1 (en) 2007-06-28
US20060294483A1 (en) 2006-12-28

Similar Documents

Publication Publication Date Title
US7849441B2 (en) Method for specifying stateful, transaction-oriented systems for flexible mapping to structurally configurable, in-memory processing semiconductor device
US10062422B2 (en) Various methods and apparatus for configurable mapping of address regions onto one or more aggregate targets
US10698853B1 (en) Virtualization of a reconfigurable data processor
US11886930B2 (en) Runtime execution of functions across reconfigurable processor
Pellauer et al. Buffets: An efficient and composable storage idiom for explicit decoupled data orchestration
US8365111B2 (en) Data driven logic simulation
Wang et al. Spread: A streaming-based partially reconfigurable architecture and programming model
US20100115196A1 (en) Shared storage for multi-threaded ordered queues in an interconnect
JPH08508599A (en) Virtual interconnect for reconfigurable logical systems
US11182264B1 (en) Intra-node buffer-based streaming for reconfigurable processor-as-a-service (RPaaS)
US11809908B2 (en) Runtime virtualization of reconfigurable data flow resources
Bucknall Build framework and runtime abstraction for partial reconfiguration on FPGA SoCs
Hosseinabady et al. Fast and low overhead architectural transaction level modelling for large-scale network-on-chip simulation
Kosonen NETWORK-ON-CHIP PERFORMANCE MODELING
Flich et al. Deeply heterogeneous many-accelerator infrastructure for HPC architecture exploration
Verdier et al. Exploring RTOS issues with a high-level model of a reconfigurable SoC platform.
Gruian et al. NoC-based CSP support for a Java chip multiprocessor
Viskic Modeling and synthesis of communication software for multi-processor systems-on-chip

Legal Events

Date Code Title Description
STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION