US20070044106A2 - Symmetric multiprocessor operating system for execution on non-independent lightweight thread contexts - Google Patents
Symmetric multiprocessor operating system for execution on non-independent lightweight thread contexts Download PDFInfo
- Publication number
- US20070044106A2 US20070044106A2 US11/330,916 US33091606A US2007044106A2 US 20070044106 A2 US20070044106 A2 US 20070044106A2 US 33091606 A US33091606 A US 33091606A US 2007044106 A2 US2007044106 A2 US 2007044106A2
- Authority
- US
- United States
- Prior art keywords
- tcs
- thread
- tlb
- asid
- updating
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000013519 translation Methods 0.000 claims abstract description 30
- 238000000034 method Methods 0.000 claims description 106
- 230000004044 response Effects 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 7
- 238000011010 flushing procedure Methods 0.000 claims description 2
- 230000008569 process Effects 0.000 description 55
- 230000006870 function Effects 0.000 description 29
- 238000010586 diagram Methods 0.000 description 25
- 230000014616 translation Effects 0.000 description 23
- 238000007667 floating Methods 0.000 description 18
- 238000012986 modification Methods 0.000 description 14
- 230000004048 modification Effects 0.000 description 14
- 230000008901 benefit Effects 0.000 description 6
- 230000000694 effects Effects 0.000 description 6
- 230000008859 change Effects 0.000 description 5
- 239000013256 coordination polymer Substances 0.000 description 5
- 230000001419 dependent effect Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 238000012423 maintenance Methods 0.000 description 4
- 230000001360 synchronised effect Effects 0.000 description 4
- 238000012546 transfer Methods 0.000 description 4
- 238000013459 approach Methods 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 230000000737 periodic effect Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 239000004065 semiconductor Substances 0.000 description 3
- 230000004888 barrier function Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000013479 data entry Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 239000000523 sample Substances 0.000 description 2
- 101100241771 Arabidopsis thaliana NUP58 gene Proteins 0.000 description 1
- 101100502248 Mus musculus Fabp9 gene Proteins 0.000 description 1
- 101100273635 Rattus norvegicus Ccn5 gene Proteins 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000003466 anti-cipated effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 230000003139 buffering effect Effects 0.000 description 1
- 235000003642 hunger Nutrition 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000001343 mnemonic effect Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000003362 replicative effect Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 230000037351 starvation Effects 0.000 description 1
- 239000000725 suspension Substances 0.000 description 1
- 230000026676 system process Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3851—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4812—Task transfer initiation or dispatching by interrupt, e.g. masked
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- the present invention relates in general to the field of multithreaded microprocessors, and particularly to execution of multiprocessor operating systems thereon.
- the performance improvements of pipelining may be realized to the extent that the instructions in the program permit it, namely to the extent that an instruction does not depend upon its predecessors in order to execute and can therefore execute in parallel with its predecessors, which is commonly referred to as instruction-level parallelism.
- instruction-level parallelism Another way in which instruction-level parallelism is exploited by contemporary microprocessors is the issuing of multiple instructions for execution per clock cycle. These microprocessors are commonly referred to as superscalar microprocessors.
- a thread is simply a sequence, or stream, of program instructions.
- a multithreaded microprocessor concurrently executes multiple threads according to some scheduling policy that dictates the fetching and issuing of instructions of the various threads, such as interleaved, blocked, or simultaneous multithreading.
- a multithreaded microprocessor typically allows the multiple threads to share the functional units of the microprocessor (e.g., instruction fetch and decode units, caches, branch prediction units, and load/store, integer, floating-point, SIMD, etc. execution units) in a concurrent fashion.
- multithreaded microprocessors include multiple sets of resources, or contexts, for storing the unique state of each thread, such as multiple program counters and general purpose register sets, to facilitate the ability to quickly switch between threads to fetch and issue instructions.
- the multithreading microprocessor does not have to save and restore these resources when switching between threads, thereby potentially reducing the average number of clock cycles per instruction.
- One example of a performance-constraining issue addressed by multithreading microprocessors is the fact that accesses to memory outside the microprocessor that must be performed due to a cache miss typically have a relatively long latency. It is common for the memory access time of a contemporary microprocessor-based computer system to be between one and two orders of magnitude greater than the cache hit access time. Instructions dependent upon the data missing in the cache are stalled in the pipeline waiting for the data to come from memory. Consequently, some or all of the pipeline stages of a single-threaded microprocessor may be idle performing no useful work for many clock cycles.
- Multithreaded microprocessors may solve this problem by issuing instructions from other threads during the memory fetch latency, thereby enabling the pipeline stages to make forward progress performing useful work, somewhat analogously to, but at a finer level of granularity than, an operating system performing a task switch on a page fault.
- Other examples of performance-constraining issues addressed by multithreading microprocessors are pipeline stalls and their accompanying idle cycles due to a data dependence; or due to a long latency instruction such as a divide instruction, floating-point instruction, or the like; or due to a limited hardware resource conflict.
- the ability of a multithreaded microprocessor to issue instructions from independent threads to pipeline stages that would otherwise be idle may significantly reduce the time required to execute the program or collection of programs comprising the threads.
- Multiprocessing is a technique related to multithreading that exploits thread-level parallelism, albeit at a higher system level, to execute a program or collection of programs faster.
- multiple processors, or CPUs share a memory system and I/O devices.
- a multiprocessor (MP) operating system facilitates the simultaneous execution of a program or collection of programs on the multiprocessor system.
- the system may include multiple Pentium IV processors all sharing a memory and I/O subsystem running an MP operating system—such as Linux SMP, an MP-capable version of Windows, Sun Solaris, etc., and executing one or more application programs concurrently.
- MP operating system such as Linux SMP, an MP-capable version of Windows, Sun Solaris, etc.
- Multithreading microprocessors exploit thread-level parallelism at an even lower level than multiprocessor systems by sharing instruction fetch, issue, and execution resources, as described above, in addition to sharing a memory system and I/O devices.
- An MP operating system may run on a multithreading microprocessor if the multithreading microprocessor presents multiple processors, or CPUs, in an architected manner recognized by the MP operating system.
- HT Hyper-Threading
- An HT Xeon includes effectively the same execution resources (e.g., caches, execution units, branch predictors) as a non-HT Xeon processor, but replicates the architectural state to present multiple distinct logical processors to an MP OS. That is, the MP operating system recognizes each logical processor as a separate processor, or CPU, each presenting the architecture of a single processor. The cost of replicating the architectural state for the additional logical processor in the Xeon in terms of additional chip size and power consumption is almost 5%.
- execution resources e.g., caches, execution units, branch predictors
- an exception is an error or other unusual condition or event that occurs during the execution of a program.
- the processor saves the state of the currently executing program and begins fetching and executing instructions at a predefined address, thereby transferring execution to an alternate program, commonly referred to as an exception handler located at the predefined address.
- the predefined address may be common to all exceptions in the list of architected exception types or may be unique to some or all of the exception types.
- the exception handler when appropriate, may restore the state and resume execution of the previously executing program.
- Examples of common exceptions include a page fault, a divide by zero, a faulty address generated by the program, a bus error encountered by the processor when attempting to read a memory location, or an invalid instruction exception caused by an invalid instruction opcode or invalid instruction operand.
- Interrupts are typically grouped as hardware interrupts and software interrupts.
- a software interrupt is generated when the currently executing program executes an architected software interrupt instruction, which causes an exception that transfers control to the architected interrupt vector associated with the software interrupt to invoke an interrupt service routine, or handler.
- a hardware interrupt is a signal received by the processor from a device to request service by the processor. Examples of interrupting devices are disk drives, direct memory access controllers, and timers.
- the processor transfers control to an architected interrupt vector associated with the interrupt request to invoke an interrupt service routine, or handler.
- each processor includes an interrupt controller, which enables each processor to direct an interrupt specifically to each of the other processors.
- the HT Xeon processors for example, include a replicated Advanced Programmable Interrupt Controller (APIC) for each logical processor, which enables each logical processor to send a hardware interrupt specifically to each of the other logical processors.
- APIC Advanced Programmable Interrupt Controller
- an IPI is in preemptive time-sharing operating systems, which receive periodic timer interrupts, in response to which the operating system may perform a task switch on one or more of the processors to schedule a different task or process to execute on the processors.
- the timer handling routine running on the processor that receives the timer interrupt not only schedules the tasks on its own processor, but also directs an interrupt to each of the other processors to cause them to schedule their tasks.
- Each processor has an architected interrupt mechanism, which the timer interrupt-receiving processor uses to direct an IPI to each of the other processors in the multiprocessor system.
- MIPS® Multithreading MT
- ASE Application-Specific Extension
- ISA MIPS Instruction Set Architecture
- PRA MIPS Privileged Resource Architecture
- the MIPS MT ASE allows two distinct, but not mutually-exclusive, multithreading capabilities.
- a single MIPS MT ASE microprocessor core comprises one or more Virtual Processing Elements (VPEs), and each VPE comprises one or more thread contexts (TCs).
- VPEs Virtual Processing Elements
- TCs thread contexts
- an N-VPE processor core presents to an SMP operating system an N-way symmetric multiprocessor.
- it presents to the SMP operating system N MIPS32® Architecture processors.
- SMP operating systems configured to run on a conventional multiprocessor system having N MIPS32 processors without the MT ASE capability will run on a single MIPS32 core with the MT ASE capabilities with little or no modifications to the SMP operating system.
- each VPE presents an architected exception domain to the SMP operating system including an architected list of exceptions that the VPE will handle. The list includes interrupts that one VPE may direct to another specific VPE in the multithreading microprocessor, somewhat similar to the HT Xeon approach.
- each VPE comprises at least one thread context, and may comprise multiple thread contexts.
- a thread context in the MIPS MT ASE comprises a program counter representation, a set of general purpose registers, a set of multiplier result registers, and some of the MIPS PRA Coprocessor 0 state, such as state describing the execution privilege level and address space identifier (ASID) of each thread context.
- the thread contexts are relatively lightweight compared to VPEs with respect to storage elements required to store state and are therefore less expensive than VPEs in terms of chip area and power consumption.
- the lightweight feature of MIPS MT ASE thread contexts makes them inherently more scalable than VPEs, and potentially than Intel HT logical processors, for example.
- the domain for exception handling is at the VPE level, not the thread context level.
- a VPE handles asynchronous exceptions, such as interrupts, opportunistically. That is, when an asynchronous exception is raised to the VPE, the VPE selects one of the eligible (i.e., not marked as exempt from servicing asynchronous exceptions) thread contexts to execute the exception handler.
- the thread context cannot specify to the VPE which thread context should handle the exception within the VPE in a MIPS MT ASE processor, i.e., the exception architecture does not provide an explicit way for the thread context to direct an asynchronous exception to a specific other thread context.
- MP operating systems such as Linux SMP, that rely on the ability of one CPU to direct an inter-processor interrupt to another CPU in response to a timer interrupt request in order to accomplish preemptive multitasked process scheduling.
- each thread context may not have its own translation lookaside buffer (TLB) or floating point coprocessor.
- TLB translation lookaside buffer
- each lightweight thread context to which is replicated less than the full architected CPU state anticipated by an existing MP operating system, such as a MIPS MT ASE thread context—appear as an architected CPU to the MP operating system, such as Linux SMP or other MP derivatives of UNIX-style operating systems.
- the present invention describes modifications to existing SMP operating systems that makes highly scalable, lightweight thread contexts within a multithreaded processor that would normally by themselves be unable to run an image, or instance, of the operating system, to function as a physical CPU for the purposes of the operating system resource management.
- the present invention provides a multiprocessing system.
- the system includes a multithreading microprocessor having a plurality of thread contexts (TCs), a translation lookaside buffer (TLB) shared by the plurality of TCs, and an instruction scheduler, coupled to the plurality of TCs, configured to dispatch to execution units, in a multithreaded fashion, instructions of threads executing on the plurality of TCs.
- TCs thread contexts
- TLB translation lookaside buffer
- the system also includes a multiprocessor operating system (OS), configured to schedule execution of the threads on the plurality of TCs, wherein a thread of the threads executing on one of the plurality of TCs is configured to update the shared TLB, and prior to updating the TLB to disable interrupts, to prevent the OS from unscheduling the TLB-updating thread from executing on the plurality of TCs, and disable the instruction scheduler from dispatching instructions from any of the plurality of TCs except from the one of the plurality of TCs on which the TLB-updating thread is executing.
- OS multiprocessor operating system
- the present invention provides a method for a multiprocessor operating system (OS) to run on a multiprocessing system including a multithreading microprocessor having a plurality of thread contexts (TCs), a translation lookaside buffer (TLB) shared by the plurality of TCs, and an instruction scheduler configured to dispatch to execution units instructions of threads executing on the plurality of TCs in a multithreaded fashion.
- OS operating system
- TCs thread contexts
- TLB translation lookaside buffer
- the method includes scheduling execution of the threads on the plurality of TCs, wherein a thread of the threads executing on one of the plurality of TCs is configured for updating the shared TLB, disabling interrupts, prior to the updating the TLB, to prevent the OS from unscheduling the TLB-updating thread from executing on the plurality of TCs, and disabling the instruction scheduler, prior to the updating the TLB, from dispatching instructions from any of the plurality of TCs except from the one of the plurality of TCs on which the TLB-updating thread is executing.
- the present invention provides a computer program product for use with a computing device, the computer program product including a computer usable medium, having computer readable program code embodied in the medium, for causing a method for method for a multiprocessor operating system (OS) to run on a multiprocessing system including a multithreading microprocessor having a plurality of thread contexts (TCs), a translation lookaside buffer (TLB) shared by the plurality of TCs, and an instruction scheduler configured to dispatch to execution units instructions of threads executing on the plurality of TCs in a multithreaded fashion.
- OS operating system
- TLB translation lookaside buffer
- the computer readable program code includes first program code for providing a step of scheduling execution of the threads on the plurality of TCs, wherein a thread of the threads executing on one of the plurality of TCs is configured for updating the shared TLB, second program code for providing a step of disabling interrupts, prior to the updating the TLB, to prevent the OS from unscheduling the TLB-updating thread from executing on the plurality of TCs, and third program code for providing a step of disabling the instruction scheduler, prior to the updating the TLB, from dispatching instructions from any of the plurality of TCs except from the one of the plurality of TCs on which the TLB-updating thread is executing.
- the present invention provides a method for providing operating system software for running on a multiprocessing system including a multithreading microprocessor having a plurality of thread contexts (TCs), a translation lookaside buffer (TLB) shared by the plurality of TCs, and an instruction scheduler configured to dispatch to execution units instructions of threads executing on the plurality of TCs in a multithreaded fashion.
- the method includes providing computer-readable program code describing the operating system software.
- the program code includes first program code for providing a step of scheduling execution of the threads on the plurality of TCs, wherein a thread of the threads executing on one of the plurality of TCs is configured for updating the shared TLB, second program code for providing a step of disabling interrupts, prior to the updating the TLB, to prevent the OS from unscheduling the TLB-updating thread from executing on the plurality of TCs, and third program code for providing a step of disabling the instruction scheduler, prior to the updating the TLB, from dispatching instructions from any of the plurality of TCs except from the one of the plurality of TCs on which the TLB-updating thread is executing.
- the method also includes transmitting the computer-readable program code as a computer data signal on a network.
- An advantage of the present invention is that it allows an SMP operating system, configured as if it were running on a relatively large number of symmetric CPUs, to run on a multithreaded processor, because each “CPU” is associated with a thread context that is very lightweight in terms of chip area and power consumption and therefore highly scalable.
- the thread contexts are lightweight because they do not each comprise the entire architectural state associated with an independent symmetric CPU; rather, the thread contexts have some architectural state replicated to each of them (such as a program counter and general purpose register set), but also share much of the architectural state between them (such as a TLB and interrupt control logic), which requires modifications to the SMP operating system to enable the number of operating system CPUs be equal to the number of thread contexts. Consequently, an existing body of coarse-grain multithreading technology embodied in SMP operating systems, such as multithreading telematics, robotics, or multimedia applications, may be exploited on such a highly scalable processor core.
- FIG. 1 is a block diagram illustrating a microprocessor according to the present invention.
- FIG. 2 is a block diagram illustrating in more detail the microprocessor of FIG. 1 .
- FIG. 3 is a block diagram illustrating an MFTR instruction executed by the microprocessor of FIG. 1 according to the present invention.
- FIG. 4 is a block diagram illustrating an MTTR instruction executed by the microprocessor of FIG. 1 according to the present invention.
- FIG. 5 is a series of block diagrams illustrating various multithreading-related registers of the microprocessor of FIG. 1 according to one embodiment of the present invention.
- FIG. 6 is a block diagram illustrating data paths of the microprocessor for performing the MFTR instruction according to the present invention.
- FIG. 7 is a block diagram illustrating data paths of the microprocessor for performing the MTTR instruction according to the present invention.
- FIG. 8 is a flowchart illustrating operation of the microprocessor to execute the MFTR instruction according to the present invention.
- FIG. 9 is a flowchart illustrating operation of the microprocessor to execute the MTTR instruction according to the present invention.
- FIG. 10 is a flowchart illustrating a method for performing an inter-processor interrupt (IPI) from one thread context to another thread context within a VPE of the microprocessor of FIG. 1 according to the present invention.
- IPI inter-processor interrupt
- FIG. 11 is a flowchart illustrating a method for performing preemptive process scheduling by a symmetric multiprocessor operating system on the microprocessor of FIG. 1 according to the present invention.
- FIG. 12 is a block diagram illustrating a prior art multiprocessor system.
- FIG. 13 is a block diagram illustrating a multiprocessor system according to the present invention.
- FIG. 14 is a block diagram of a cpu_data array entry in an SMTC Linux operating system according to the present invention.
- FIG. 15 is a flowchart illustrating operation of the SMTC operating system on a system of FIG. 13 according to the present invention.
- FIG. 16 is two flowcharts illustrating operation of the SMTC operating system on a system of FIG. 13 according to the present invention.
- FIG. 17 is three flowcharts illustrating operation of the SMTC operating system on a system of FIG. 13 according to the present invention.
- FIG. 18 is a flowchart illustrating operation of the SMTC operating system on a system of FIG. 13 according to the present invention.
- FIG. 19 is two flowcharts and two block diagrams illustrating operation of the SMTC operating system on a system of FIG. 13 according to the present invention.
- FIG. 20 is a flowchart illustrating operation of the SMTC operating system on a system of FIG. 13 according to the present invention.
- FIG. 21 is a flowchart illustrating operation of the SMTC operating system on a system of FIG. 13 according to the present invention.
- FIGS. 22 through 24 are flowcharts illustrating a method for providing software for performing the steps of the present invention and subsequently transmitting the software as a computer data signal over a communication network.
- MIPS RISC Architecture For a better understanding of exception processing, translation lookaside buffer (TLB) operation, and floating point unit (FPU) coprocessor operation on MIPS architecture processors in general, the reader is referred to MIPS RISC Architecture, by Gerry Kane and Joe Heinrich, published by Prentice Hall, and to See MIPS Run, by Dominic Sweetman, published by Morgan Kaufman Publishers.
- Embodiments of the present invention are described herein in the context of a processor core that includes the MIPS® MT Application-Specific Extension (ASE) to the MIPS32® Architecture; however, the present invention is not limited to a processor core with said architecture. Rather, the present invention may be implemented in any processor system which includes a plurality of thread contexts for concurrently executing a corresponding plurality of threads, but which does not include an interrupt input for each of the plurality of thread contexts that would allow one thread context to direct an inter-processor interrupt specifically to another thread context.
- ASE MIPS® MT Application-Specific Extension
- the microprocessor 100 includes a virtual multiprocessor (VMP) context 108 and a plurality of virtual processing elements (VPEs) 102 .
- VMP virtual multiprocessor
- VPEs virtual processing elements
- Each VPE 102 includes a VPE context 106 and at least one thread context (TC) 104 .
- the VMP context 108 comprises a collection of storage elements, such as registers or latches, and/or bits in the storage elements of the microprocessor 100 that describe the state of execution of the microprocessor 100 .
- the VMP context 108 stores state related to global resources of the microprocessor 100 that are shared among the VPEs 102 , such as the instruction cache 202 , instruction fetcher 204 , instruction decoder 206 , instruction issuer 208 , instruction scheduler 216 , execution units 212 , and data cache 242 of FIG. 2 , or other shared elements of the microprocessor 100 pipeline described below.
- the VMP context 108 includes the MVPControl Register 501 , MVPConf 0 Register 502 , and MVPConf 1 Register 503 of FIGS. 5B-5D described below.
- a thread context 104 comprises a collection of storage elements, such as registers or latches, and/or bits in the storage elements of the microprocessor 100 that describe the state of execution of a thread, and which enable an operating system to manage the resources of the thread context 104 . That is, the thread context describes the state of its respective thread, which is unique to the thread, rather than state shared with other threads of execution executing concurrently on the microprocessor 100 .
- a thread also referred to herein as a thread of execution, or instruction stream—is a sequence of instructions.
- the microprocessor 100 is a multithreading microprocessor. That is, the microprocessor 100 is configured to concurrently execute multiple threads of execution.
- the microprocessor 100 By storing the state of each thread in the multiple thread contexts 104 , the microprocessor 100 is configured to quickly switch between threads to fetch and issue instructions.
- the elements of a thread context 104 of various embodiments are described below with respect to the remaining Figures.
- the present microprocessor 100 is configured to execute the MFTR instruction 300 of FIG. 3 and the MTTR instruction 400 of FIG. 4 for moving thread context 104 information between the various thread contexts 104 , as described in detail herein.
- the VPE context 106 includes a collection of storage elements, such as registers or latches, and/or bits in the storage elements of the microprocessor 100 that describe the state of execution of a VPE 102 , which enable an operating system to manage the resources of the VPE 102 , such as virtual memory, caches, exceptions, and other configuration and status information. Consequently, a microprocessor 100 with N VPEs 102 may appear to an operating system as an N-way symmetric multiprocessor. However, as also described herein, a microprocessor 100 with M thread contexts 104 may appear to an operating system as an M-way symmetric multiprocessor, such as shown with respect to FIG. 13 . In particular, threads running on the thread contexts 104 may include MFTR instructions 300 and MTTR instructions 400 to read and write another thread context 104 to emulate a directed exception, such as an inter-processor interrupt, as described herein.
- MFTR instructions 300 and MTTR instructions 400 to read and write another thread context 104 to emulate a directed exception,
- the VPEs 102 share various of the microprocessor 100 resources, such as the instruction cache 202 , instruction fetcher 204 , instruction decoder 206 , instruction issuer 208 , instruction scheduler 216 , execution units 212 , and data cache 242 of FIG. 2 , transparently to the operating system.
- each VPE 102 substantially conforms to a MIPS32 or MIPS64 Instruction Set Architecture (ISA) and a MIPS Privileged Resource Architecture (PRA), and the VPE context 106 includes the MIPS PRA Coprocessor 0 and system state necessary to describe one or more instantiations thereof.
- the VPE context 106 includes the VPEControl Register 504 , VPEConf 0 Register 505 , VPEConf 1 Register 506 , YQMask Register 591 , VPESchedule Register 592 , and VPEScheFBack Register 593 of FIGS. 5E-5H and EPC Register 598 , Status Register 571 , EntryHi Register 526 , Context Register 527 , and Cause Register 536 of FIGS. 5L-5P described below.
- a VPE 102 may be viewed as an exception domain. That is, when an asynchronous exception (such as a hardware or software interrupt) is generated, or when an instruction of one of the thread contexts 104 of a VPE 102 generates a synchronous exception (such as an address error, bus error, or invalid instruction exception), multithreading is suspended on the VPE 102 (i.e., only instructions of the instruction stream associated with the thread context 104 servicing the exception are fetched and issued), and each VPE context 106 includes the state necessary to service the exception. Once the exception is serviced, the exception handler may selectively re-enable multithreading on the VPE 102 .
- an asynchronous exception such as a hardware or software interrupt
- a synchronous exception such as an address error, bus error, or invalid instruction exception
- multithreading is suspended on the VPE 102 (i.e., only instructions of the instruction stream associated with the thread context 104 servicing the exception are fetched and issued), and each VPE context 106 includes the state necessary to
- the VPE 102 selects one of the eligible (i.e., not marked as exempt from servicing asynchronous exceptions as indicated by the IXMT bit 518 of FIG. 5J ) thread contexts 104 of the VPE 102 to execute the exception handler.
- the manner used by the VPE 102 to select one of the eligible thread contexts is implementation-dependent, such as selecting pseudo-randomly, in a round-robin fashion, or based on the relative priorities of the thread contexts 104 .
- the asynchronous exception itself does not specify which thread context 104 of the VPE 102 is to handle the exception.
- the microprocessor 100 does not provide a hardware exception mechanism for one thread context 104 to direct an asynchronous exception to another specific thread context 104 .
- the present invention provides a method for operating system software to emulate one thread context 104 directing an asynchronous exception to another specific thread context 104 , as described herein.
- the microprocessor 100 is a pipelined microprocessor comprising a plurality of pipeline stages.
- the microprocessor 100 includes a plurality of thread contexts 104 of FIG. 1 .
- the embodiment of FIG. 2 shows four thread contexts 104 ; however, it should be understood that the number of four thread contexts 104 is chosen only for illustration purposes, and the microprocessor 100 described herein embodying the present invention is susceptible to any number of thread contexts 104 .
- the number of thread contexts 104 may be up to 256.
- a microprocessor 100 may include multiple VPEs 102 , each having multiple thread contexts 104 .
- each thread context 104 comprises a program counter (PC) 222 for storing an address for fetching a next instruction in the associated instruction stream, a general purpose register (GPR) set 224 for storing intermediate execution results of the instruction stream issuing from the thread context based on the program counter 222 value, and other per-thread context 226 .
- the microprocessor 100 includes a multiplier unit, and the other thread context 226 includes registers for storing results of the multiplier unit specifically associated with multiply instructions in the instruction stream.
- the other thread context 226 includes information for uniquely identifying each thread context 104 .
- the thread identification information includes information for specifying the execution privilege level of the associated thread, such as whether the thread is a kernel, supervisor, or user level thread, such as is stored in the TKSU bits 589 of the TCStatus Register 508 of FIG. 5J .
- the thread identification information includes information for identifying a task or process comprising the thread.
- the task identification information may be used as an address space identifier (ASID) for purposes of translating physical addresses into virtual addresses, such as is stored in the TASID bits 528 of the TCStatus Register 508 , which are reflected in the EntryHi Register 526 of FIG. 5N .
- ASID address space identifier
- the other per-thread context 226 includes the TCStatus Register 508 , TCRestart Register 594 , TCHalt Register 509 , TCContext Register 595 , TCSchedule Register 596 , TCBind Register 556 and TCScheFBack Register 597 of FIGS. 5J-5L .
- the microprocessor 100 includes a scheduler 216 for scheduling execution of the various threads being concurrently executed by the microprocessor 100 .
- the scheduler 216 is coupled to the VMP context 108 and VPE contexts 106 of FIG. 1 and to the other per-thread context 226 .
- the scheduler 216 is responsible for scheduling fetching of instructions from the program counter 222 of the various thread contexts 104 and for scheduling issuing of the fetched instructions to execution units 212 of the microprocessor 100 , as described below.
- the scheduler 216 schedules execution of the threads based on a scheduling policy of the microprocessor 100 .
- the scheduling policy may include, but is not limited to, any of the following scheduling policies.
- the scheduler 216 employs a round-robin, or time-division-multiplexed, or interleaved, scheduling policy that allocates a predetermined number of clock cycles or instruction issue slots to each ready thread in a rotating order.
- the round-robin policy is useful in an application in which fairness is important and a minimum quality of service is required for certain threads, such as real-time application program threads.
- the scheduler 216 employs a blocking scheduling policy wherein the scheduler 216 continues to schedule fetching and issuing of a currently running thread until an event occurs that blocks further progress of the thread, such as a cache miss, a branch misprediction, a data dependency, or a long latency instruction.
- the microprocessor 100 comprises a superscalar pipelined microprocessor, and the scheduler 216 schedules the issue of multiple instructions per clock cycle, and in particular, the issue of instructions from multiple threads per clock cycle, commonly referred to as simultaneous multithreading.
- the microprocessor 100 includes an instruction cache 202 for caching program instructions fetched from a system memory of a system including the microprocessor 100 , such as the MFTR/MTTR 300 / 400 instructions.
- the microprocessor 100 provides virtual memory capability
- the fetch unit 204 includes a translation lookaside buffer (TLB) for caching virtual to physical memory page translations.
- TLB translation lookaside buffer
- each thread, or program, or task, executing on the microprocessor 100 is assigned a unique task ID, or address space ID (ASID), which is used to perform memory accesses and in particular memory address translations, and a thread context 104 also includes storage for an ASID associated with the thread.
- ASID address space ID
- the various threads executing on the microprocessor 100 share the instruction cache 202 and TLB, as discussed in more detail below.
- the microprocessor 100 also includes a fetch unit 204 , coupled to the instruction cache 202 , for fetching program instructions, such as MFTR/MTTR 300 / 400 instructions, from the instruction cache 202 and system memory.
- the fetch unit 204 fetches instructions at an instruction fetch address provided by a multiplexer 244 .
- the multiplexer 244 receives a plurality of instruction fetch addresses from the corresponding plurality of program counters 222 .
- Each of the program counters 222 stores a current instruction fetch address for a different program thread.
- FIG. 2 illustrates four different program counters 222 associated with four different threads.
- the multiplexer 244 selects one of the four program counters 222 based on a selection input provided by the scheduler 216 . In one embodiment, the various threads executing on the microprocessor 100 share the fetch unit 204 .
- the microprocessor 100 also includes a decode unit 206 , coupled to the fetch unit 204 , for decoding program instructions fetched by the fetch unit 204 , such as MFTR/MTTR 300 / 400 instructions.
- the decode unit 206 decodes the opcode, operand, and other fields of the instructions.
- the various threads executing on the microprocessor 100 share the decode unit 206 .
- the microprocessor 100 also includes execution units 212 for executing instructions.
- the execution units 212 may include but are not limited to one or more integer units for performing integer arithmetic, Boolean operations, shift operations, rotate operations, and the like; floating point units for performing floating point operations; load/store units for performing memory accesses and in particular accesses to a data cache 242 coupled to the execution units 212 ; and a branch resolution unit for resolving the outcome and target address of branch instructions.
- the data cache 242 includes a translation lookaside buffer (TLB) for caching virtual to physical memory page translations, which is shared by the various thread contexts, as described in more detail below.
- TLB translation lookaside buffer
- the execution units 212 In addition to the operands received from the data cache 242 , the execution units 212 also receive operands from registers of the general purpose register sets 224 . In particular, an execution unit 212 receives operands from a register set 224 of the thread context 104 allocated to the thread to which the instruction belongs. A multiplexer 248 selects operands from the appropriate register set 224 for provision to the execution units 212 . In addition, the multiplexer 248 receives data from each of the other per-thread contexts 226 and program counters 222 , for selective provision to the execution units 212 based on the thread context 104 of the instruction being executed by the execution unit 212 . In one embodiment, the various execution units 212 may concurrently execute instructions from multiple concurrent threads.
- the microprocessor 100 also includes an instruction issue unit 208 , coupled to the scheduler 216 and coupled between the decode unit 206 and the execution units 212 , for issuing instructions to the execution units 212 as instructed by the scheduler 216 and in response to information about the instructions decoded by the decode unit 206 .
- the instruction issue unit 208 insures that instructions are not issued to the execution units 212 if they have data dependencies on other instructions previously issued to the execution units 212 .
- an instruction queue is imposed between the decode unit 206 and the instruction issue unit 208 for buffering instructions awaiting issue to the execution units 212 for reducing the likelihood of starvation of the execution units 212 .
- the various threads executing on the microprocessor 100 share the instruction issue unit 208 .
- the microprocessor 100 also includes a write-back unit 214 , coupled to the execution units 212 , for writing back results of instructions into the general purpose register sets 224 , program counters 222 , and other thread contexts 226 .
- a demultiplexer 246 receives the instruction result from the write-back unit 214 and stores the instruction result into the appropriate register set 224 , program counters 222 , and other thread contexts 226 associated with the instruction's thread.
- the instruction results are also provided for storage into the VPE contexts 106 and the VMP context 108 .
- FIG. 3 a block diagram illustrating an MFTR instruction 300 executed by the microprocessor 100 of FIG. 1 according to the present invention is shown.
- FIG. 3 comprises FIG. 3A illustrating the format and function of the MFTR instruction 300 , and FIG. 3B illustrating a table 350 specifying selection of the MFTR instruction 300 source register 324 based on its operand values.
- the mnemonic for the MFTR instruction 300 is MFTR rt, rd, u, sel, h as shown.
- the MFTR instruction 300 instructs the microprocessor 100 to copy the contents of a source register 324 of a target thread context 104 to a destination register 322 of an issuing thread context 104 .
- Bits 11 - 15 are an rd field 308 , which specifies an rd register 322 , or destination register 322 , within the general purpose register set 224 of FIG. 2 of the thread context 104 from which the MFTR instruction 300 is issued, referred to herein as the issuing thread context.
- the destination register 322 is one of 32 general purpose registers of the MIPS ISA.
- Bits 16 - 20 , 6 - 10 , 5 , 4 , and 0 - 2 are an rt field 306 , rx field 312 , u field 314 , h field 316 , and sel field 318 , respectively, which collectively are used to specify a source register 324 of a thread context 104 distinct from the issuing thread context, referred to herein as the target thread context 104 .
- the use of the rt field 306 , rx field 312 , u field 314 , h field 316 , and sel field 318 to specify the source register 324 is described in detail in table 350 of FIG. 3B .
- the microprocessor 100 includes one or more processor control coprocessors, referred to in the MIPS PRA as Coprocessor 0 , or CP 0 , or Cop 0 , denoted 602 in FIGS. 6 and 8 , which is generally used to perform various microprocessor 100 configuration and control functions, such as cache control, exception control, memory management unit control, and particularly multithreading control and configuration.
- processor control coprocessors referred to in the MIPS PRA as Coprocessor 0 , or CP 0 , or Cop 0 , denoted 602 in FIGS. 6 and 8 , which is generally used to perform various microprocessor 100 configuration and control functions, such as cache control, exception control, memory management unit control, and particularly multithreading control and configuration.
- Table 350 a u field 314 value of 0 selects one of the CP 0 registers as the MFTR instruction 300 source register 324 .
- 5A illustrates the particular rt field 306 (or rd 308 in the case of MTTR 400 ) and sel field 318 values used to select the various multithreading-related CP 0 registers.
- a u field 314 value of 1 and a sel field 318 value of 0 selects one of the general purpose registers 224 of FIG. 2 , selected by the rt field 306 value, as the MFTR instruction 300 source register 324 .
- the microprocessor 100 includes a digital signal processor (DSP) arithmetic unit or multiplier for performing common DSP-related arithmetic operations
- each thread context 104 includes four accumulators for storing the TC-specific results of the arithmetic operations and a DSPControl register of the DSP accumulators, denoted 224 in FIGS. 6 and 8 .
- a u field 314 value of 1 and a sel field 318 value of 1 selects as the MFTR instruction 300 source register 324 one of the DSP accumulator registers or the DSPControl register, selected by the rt field 306 value, as shown.
- the microprocessor 100 includes one or more floating point or multimedia coprocessors, referred to in the MIPS PRA as Coprocessor 1 , or CP 1 , or Cop 1 , denoted 604 in FIGS. 6 and 8 .
- Coprocessor 1 or CP 1 , or Cop 1
- Table 350 a u field 314 value of 1 and a sel field 318 value of 2 selects as the MFTR instruction 300 source register 324 one of the floating point unit data registers (FPR) selected by the rt field 306 value; furthermore, a sel field 318 value of 3 selects as the MFTR instruction 300 source register 324 one of the floating point unit control registers (FPCR) selected by the rt field 306 value.
- FPR floating point unit data registers
- FPCR floating point unit control registers
- the microprocessor 100 includes one or more implementation-specific coprocessors, referred to in the MIPS PRA as Coprocessor 2 , or CP 2 , or Cop 2 , denoted 606 in FIGS. 6 and 8 .
- Coprocessor 2 or CP 2 , or Cop 2
- Table 350 a u field 314 value of 1 and a sel field 318 value of 4 selects as the MFTR instruction 300 source register 324 one of the CP 2 data registers (Cop 2 Data) selected by the concatenation of the rx field 312 value and the rt field 306 value; furthermore, a sel field 318 value of 5 selects as the MFTR instruction 300 source register 324 one of the CP 2 control registers (Cop 2 Control) selected by the concatenation of the rx field 312 value and the rt field 306 value.
- the source register 324 is further specified by a TargTC operand 332 .
- the TargTC 332 operand specifies the target thread context 104 containing the source register 324 .
- the TargTC operand 332 is stored in the VPEControl Register 504 of FIG. 5E . If the source register 324 is a per-VPE 102 register, the source register 324 is of the VPE 102 to which the target thread context 104 is bound, as specified by the CurVPE field 558 of the TCBind Register 556 of FIG. 5K .
- FIG. 4 a block diagram illustrating an MTTR instruction 400 executed by the microprocessor 100 of FIG. 1 according to the present invention is shown.
- FIG. 4 comprises FIG. 4A illustrating the format and function of the MTTR instruction 400 , and FIG. 4B illustrating a table 450 specifying selection of the MTTR instruction 400 destination register 422 based on its operand values.
- the various fields of the MTTR instruction 400 are identical to the fields of the MFTR instruction 300 , except that the value of the sub-opcode field 404 is different, and the use of the rt field 306 and rd field 308 is reversed, i.e., the rt field 306 is used by the MTTR instruction 400 to select the source register 424 and the rd field 308 is used—along with the rx 3 12 , u 314 , h 316 , and sel 318 fields—to select the destination register 422 within the thread context 104 specified by the TargTC 332 operand, as shown in FIG. 4 .
- the MTTR instruction 400 instructs the microprocessor 100 to copy the contents of a source register 424 of the issuing thread context 104 to a destination register 424 of the target thread context 104 .
- FIG. 5 a series of block diagrams illustrating various multithreading-related registers of the microprocessor 100 of FIG. 1 according to one embodiment of the present invention is shown.
- FIG. 5 comprises FIG. 5A-5P .
- the registers of FIG. 5 are comprised in CP 0 602 of FIG. 6 and 8
- FIG. 5A is a table 500 indicating the particular rt field 306 (or rd 308 in the case of MTTR 400 ) and sel field 318 values used to select the various multithreading-related CP 0 registers 602 .
- some of the registers are included in the VMP context 108 of FIG.
- FIGS. 5B-5P include an illustration of the fields of each of the multithreading registers and a table describing the various fields. Fields of particular relevance are discussed in more detail herein.
- one thread context i.e., the target thread context 104
- another thread context 104 i.e., the issuing thread context 104
- executes an MFTR 300 or MTTR 400 instruction respectively, depending upon the readability or writeability of the particular register or bits thereof
- the EVP bit 513 of FIG. 5B controls whether the microprocessor 100 is executing as a virtual multiprocessor, i.e., if multiple VPEs 102 may concurrently fetch and issue instructions from distinct threads of execution.
- the PVPE field 524 of FIG. 5C specifies the total number of VPEs 102 , i.e., the total number of VPE contexts 106 , instantiated in the microprocessor 100 . In the embodiment of FIG. 5 , up to sixteen VPEs 102 may be instantiated in the microprocessor 100 .
- the PTC field 525 of FIG. 5C specifies the total number of thread contexts 104 instantiated in the microprocessor 100 . In the embodiment of FIG.
- up to 256 thread contexts 104 may be instantiated in the microprocessor 100 .
- the TE bit 543 of FIG. 5E controls whether multithreading is enabled or disabled within a VPE 102 .
- the effect of clearing the EVP bit 513 and TE bit 543 may not be instantaneous; consequently the operating system should execute a hazard barrier instruction to insure that all VPEs 102 and thread contexts 104 , respectively, have been quiesced.
- TargTC field 332 of FIG. 5E is used by an issuing thread context 104 to specify the thread context 104 that contains the source register 324 in the case of an MFTR instruction 300 or the destination register 422 in the case of an MTTR instruction 400 .
- the issuing thread context 104 executes an instruction prior to the MFTR/MTTR instruction 300 / 400 to populate the TargTC 332 field of the VPEControl Register 504 .
- a single TargTC 332 value per VPE 102 is sufficient since multithreading must be disabled on the VPE 102 issuing the MFTR/MTTR 300 / 400 instruction; hence, none of the other thread contexts 104 of the VPE 102 may be using the TargTC 332 field of the VPEControl Register 504 of the issuing VPE 102 .
- the TargTC 332 value may be provided within a field of the MFTR/MTTR 300 / 400 instructions.
- the TargTC field 332 is used to specify the target thread context 104 independent of the VPE 102 to which the target thread context 104 is bound.
- Each thread context 104 in the microprocessor 100 has a unique number, or identifier, specified in the CurTC field 557 of the TCBind Register 556 of FIG. 5K , with values 0 through N-1, where N is the number of instantiated thread contexts 104 , and N may be up to 256. If the target register (source register 324 of an MFTR instruction 300 , or destination register 422 of an MTTR instruction 400 ) is a per-TC register, then the target register is in the thread context 104 specified by the TargTC 332 value; if the target register is a per-VPE register, then the target register is in the VPE 102 to which the thread context 104 specified in the TargTC 332 is bound.
- the TCU 0 . . . TCU 3 bits 581 of the TCStatus Register 508 of FIG. 5J control and indicate whether the thread context 104 controls access to its VPE's 102 Coprocessor 0 , 1 , 2 , or 3 , respectively.
- the TCU 0 . . . TCU 3 bits 581 and TKSU bits 589 of the TCStatus Register 508 correspond to the CU 0 . . . CU 3 bits 572 and the KSU bits 574 , respectively, of the Status Register 571 of FIG. 5M ; and the TASID bits 528 of the TCStatus Register 508 correspond to the ASID bits 538 of the Coprocessor 0 EntryHi Register 526 of FIG.
- the TCContext Register 595 of FIG. 5L is a read/write register usable by the operating system as a pointer to a thread context-specific storage area in memory, such as a thread context control block.
- the TCContext Register 595 may be used by the operating system, for example, to save and restore state of a thread context 104 when the program thread associated with the thread context 104 must be swapped out for use by another program thread.
- the RNST bits 582 of the TCStatus Register 508 indicate the state of the thread context 104 , namely whether the thread context 104 is running or blocked, and if blocked the reason for blockage.
- the RNST 582 value is only stable when read by an MFTR instruction 300 if the target thread context 104 is in a halted state, which is described below; otherwise, the RNST 582 value may change asynchronously and unpredictably.
- the microprocessor 100 will fetch and issue instructions from the thread of execution specified by the thread context 104 program counter 222 according to the scheduler 216 scheduling policy.
- a thread context 104 may be halted if the H bit 599 of the TCHalt Register 509 of FIG. 5K is set. That is, a first thread context 104 running an operating system thread may halt a second thread context 104 by writing a 1 to the H bit 599 of the TCHalt Register 509 of the second thread context 104 .
- a free thread context 104 has no valid content and the microprocessor 100 does not schedule instructions of a free thread context 104 to be fetched or issued. The microprocessor 100 schedules instructions of an activated thread context 104 to be fetched and issued from the activated thread context 104 program counter 222 .
- the microprocessor 100 schedules only activated thread contexts 104 .
- the microprocessor 100 allows the operating system to allocate only free thread contexts 104 to create new threads. Setting the H bit 599 of an activated thread context 104 causes the thread context 104 to cease fetching instructions and to load its restart address 549 into the TCRestart register 594 of FIG. 5K with the address of the next instruction to be issued for the thread context 104 . Only a thread context 104 in a halted state is guaranteed to be stable as seen by other thread contexts 104 , i.e., when examined by an MFTR instruction 300 .
- Multithreaded execution may be temporarily inhibited on a VPE 102 due to exceptions or explicit software interventions, but activated thread contexts 104 that are inhibited in such cases are considered to be suspended, rather than implicitly halted.
- a suspended thread context 104 is inhibited from any action which might cause exceptions or otherwise change global VPE 102 privileged resource state, but unlike a halted thread, a suspended thread context 104 may still have instructions active in the pipeline; consequently, the suspended thread context 104 , including general purpose registers 224 values, may still be unstable; therefore, the thread context 104 should not be examined by an MFTR instruction 300 until the thread context 104 is halted.
- the effect of clearing the H bit 599 may not be instantaneous; consequently the operating system should execute a hazard barrier instruction to insure that the target thread context has been quiesced.
- the TCRestart Register 594 may be read to obtain the address 549 of the instruction at which the microprocessor 100 will resume execution of the thread context 104 when the thread context 104 is restarted.
- the restart address 549 will advance beyond the address of the branch or jump instruction only after the instruction in the delay slot has been retired. If the thread context 104 is halted between the execution of a branch instruction and the associated delay slot instruction, the branch delay slot is indicated by the TDS bit 584 of the TCStatus Register 508 .
- the TCRestart register 594 can be written while its thread context 104 is halted to change the address at which the thread context 104 will restart. Furthermore, a first thread context 104 running an operating system thread may restart a second thread context 104 by writing a 0 to the H bit 599 of the TCHalt Register 509 of the second thread context 104 . Clearing the H bit 599 of an activated thread context 104 allows the thread context 104 to be scheduled, and begin fetching and executing instructions at its restart address 549 specified in its TCRestart register 594 .
- the Coprocessor 0 EPC Register 598 of FIG. 5L contains the address at which the exception servicing thread context 104 will resume execution after an exception has been serviced and the thread context 104 executes an ERET (exception return) instruction. That is, when the thread running on the thread context 104 executes an ERET instruction, the VPE 102 reads the EPC Register 598 to determine the address at which to begin fetching and issuing instructions. Unless the EXL bit 576 of the Status Register 571 of FIG. 5M is already set, the microprocessor 100 writes the EPC Register 598 when an exception is raised.
- ERET exception return
- the microprocessor 100 For synchronous exceptions, the microprocessor 100 writes the address of the instruction that was the direct cause of the exception, or the address of the immediately preceding branch or jump instruction, if the exception-causing instruction is in a branch delay slot. For asynchronous exceptions, the microprocessor 100 writes the address of the instruction at which execution will be resumed.
- the EPC Register 598 is instantiated for each VPE 102 in the microprocessor 100 .
- the VPE 102 selects one of its thread contexts 104 to service the exception. All thread contexts 104 of the VPE 102 , other than the thread context 104 selected to service the exception, are stopped and suspended until the EXL bit 576 and ERL bit 575 of the Status Register 571 are cleared.
- the microprocessor 100 selects the thread context 104 running the thread containing the offending instruction to service the exception.
- the general purpose registers 224 , program counter 222 , and other per-thread context 226 of the offending thread context 104 are used to service the synchronous exception.
- an asynchronous exception is raised, such as an interrupt, the microprocessor 100 selects one of the eligible thread contexts 104 bound to the VPE 102 to service the asynchronous exception.
- the VPE 102 to which a thread context 104 is bound (as indicated by the CurVPE field 558 of the TCBind Register 556 ) is the exception domain for the thread context 104 .
- a VPE 102 selects a thread context 104 bound to it, i.e., within its exception domain, to service an exception.
- a thread context 104 utilizes the resources related to handling exceptions (such as the Coprocessor 0 EPC Register 598 and Status Register 571 ) of the exception domain, or VPE 102 , to which the thread context 104 is bound when servicing an exception.
- exceptions such as the Coprocessor 0 EPC Register 598 and Status Register 571
- VPE 102 the resources related to handling exceptions
- the method for choosing the eligible thread context 104 to service an asynchronous exception is implementation-dependent and may be adapted to satisfy the particular application in which the microprocessor 100 is employed.
- the MIPS MT ASE does not provide the capability for the asynchronous exception to specify which of the thread contexts 104 must service the asynchronous exception.
- the microprocessor 100 saves the restart address of the thread context 104 selected to service the exception in the EPC Register 598 of the VPE 102 to which the selected thread context 104 is bound. Additionally, a thread context 104 may be made ineligible for being selected to service an asynchronous exception by setting the IXMT bit 518 in its TCStatus Register 508 .
- the program counter 222 of FIG. 2 is not an architecturally-visible register, but is affected indirectly by various events and instructions.
- the program counter 222 is a virtual program counter represented by various storage elements within the microprocessor 100 pipeline, and the meaning or value of the program counter 222 depends upon the context in which it is examined or updated. For example, as a thread context 104 fetches instructions from the instruction cache 202 , the program counter 222 value is the address at which the instructions are being fetched. Thus, in this context the storage element storing the current fetch address may be viewed as the program counter 222 .
- the address written by the VPE 102 to the EPC Register 598 may be viewed as the program counter 222 value of the selected thread context 104 in this situation since when the selected thread context 104 executes an ERET instruction, fetching for the thread context 104 begins at the EPC Register 598 value.
- the TCRestart register 594 of a thread context 104 may be viewed as the program counter 222 when a thread context 104 is halted since when the thread context 104 is unhalted, fetching for the thread context 104 begins at the TCRestart register 594 value.
- the Coprocessor 0 Status Register 571 of FIG. 5M is instantiated for each VPE 102 in the microprocessor 100 . Only certain fields of the Status Register 571 are described herein. For a more detailed description of the other bits in the Status Register 571 , the reader is referred to the document MIPS32® Architecture for Programmers Volume III: The MIPS32® Privileged Resource Architecture, Document Number: MD00090, Revision 2 . 50 , Jul. 1, 2005, which is hereby incorporated by reference in its entirety for all purposes. As discussed above, the CU 0 . . . CU 3 bits 572 and the KSU bits 574 correspond to the TCU 0 . . .
- the ERL bit 575 is set by the microprocessor 100 hardware whenever a Reset, Soft Reset, NMI, or Cache Error exception is taken.
- the EXL bit 576 is set by the microprocessor 100 hardware whenever any other exception is taken.
- ERL 575 or EXL 576 is set, the VPE 102 is running in kernel mode with interrupts disabled.
- the IE bit 577 is set, all interrupts for the VPE 102 are disabled.
- the microprocessor 100 includes selection logic 636 that receives the contents of each of the registers of Coprocessor 0 602 , Coprocessor 1 604 , Coprocessor 2 606 , and the general purpose and DSP accumulator registers 224 of FIG.
- the source register 324 contents which is one of the register contents from the target thread context 104 , for provision to deselection logic 638 based on values of the rt 306 operand, the rx 312 operand, the u 314 operand, the h 316 operand, and the sel 318 operand of the MFTR instruction 300 , as well as the TargTC 332 operand.
- the deselection logic 638 receives the source register 324 contents selected by the selection logic 636 and writes the selected contents into the destination register 322 , which is one of the general purpose registers 224 of the issuing thread context 104 , based on the value of the rd 308 operand of the MFTR instruction 300 , as well as signals 632 and 634 indicating the issuing VPE 102 and issuing thread context 104 , respectively.
- the microprocessor 100 includes selection logic 738 that receives the contents of each of the general purpose registers 224 of the issuing thread context 104 and selects the source register 424 , which is one of the register contents from the issuing thread context 104 , for provision to deselection logic 736 based on the value of the rt 306 operand of the MTTR instruction 400 , as well as signals 632 and 634 indicating the issuing VPE 102 and issuing thread context 104 , respectively.
- the deselection logic 736 receives the source register 424 contents selected by the selection logic 738 and writes the selected contents into the destination register 422 , which is one of the registers of Coprocessor 0 602 , Coprocessor 1 604 , Coprocessor 2 606 , or the general purpose and DSP accumulator registers 224 of FIG. 2 , based on values of the rd 308 operand, the rx 312 operand, the u 314 operand, the h 316 operand, and the sel 318 operand of the MTTR instruction 400 , as well as the TargTC 332 operand.
- 6 and 7 may comprise a hierarchy of multiplexers, demultiplexers, data buses, and control logic for generating a plurality of bank and register selectors to control the multiplexers and demultiplexers for selecting the appropriate values from the specified register for provision on the data buses.
- the data paths may also include intermediate registers for storing the values transferred between the issuing and target thread contexts over multiple clock cycles.
- FIG. 8 a flowchart illustrating operation of the microprocessor 100 to execute the MFTR instruction 300 according to the present invention is shown. Flow begins at block 802 .
- the instruction issuer 208 of FIG. 2 issues an MFTR instruction 300 to the execution units 212 .
- Flow proceeds to decision block 803 .
- the execution unit 212 examines the TKSU bits 589 of the TCStatus Register 508 to determine whether the privilege level of the issuing thread context 104 is at kernel privilege level. If so, flow proceeds to decision block 804 ; otherwise, flow proceeds to block 805 .
- the execution unit 212 raises an exception to the MFTR instruction 300 since the issuing thread context 104 does not have sufficient privilege level to execute the MFTR instruction 300 .
- the execution unit 212 determines whether the target thread context 104 is halted by examining the value of the H bit 599 of the TCHalt Register 509 of FIG. 5K . If the target thread context 104 is halted, flow proceeds to decision block 808 ; otherwise flow proceeds to block 816 .
- the execution unit 212 examines the TargTC 332 value of the issuing VPE 102 VPEControl Register 504 to determine whether the TargTC 332 value is valid. In one embodiment, the TargTC 332 value is not valid if the issuing VPE is not the master VPE 102 , as indicated by a clear value in the MVP bit 553 of the VPEConf 0 Register 505 of FIG. 5F . In one embodiment, the TargTC 332 value is not valid if the thread context 104 specified by TargTC 332 is not instantiated. If the TargTC 332 value is valid, flow proceeds to decision block 812 ; otherwise, flow proceeds to block 816 .
- the execution unit 212 examines the TCU bits 581 in the TCStatus Register 508 of FIG. 5J to determine whether the MFTR instruction 300 references a coprocessor, and if so, whether the coprocessor is bound to and accessible by the target thread context 104 specified by the TargTC 332 value. If the MFTR instruction 300 references a coprocessor, and the coprocessor is not bound to and accessible by the target thread context 104 specified by the TargTC 332 value, flow proceeds to block 816 ; otherwise, flow proceeds to decision block 814 .
- the execution unit 212 determines whether the source register 324 specified by the MFTR instruction 300 is instantiated. If so, flow proceeds to block 824 ; otherwise, flow proceeds to block 816 .
- the results of the MFTR instruction 300 are invalid. That is, the microprocessor 100 attempts to perform block 824 ; however, the source, destination, and values of the data transfer are invalid. Flow ends at block 816 .
- the execution unit 212 copies the contents of the source register 324 of the target thread context 104 to the destination register 322 of the issuing thread context 104 .
- the microprocessor 100 after reading the source register 324 , updates the source register 324 with an update value.
- the read/update is performed atomically.
- the update value is provided in the GPR 224 specified by the rd field 308 in the MFTR instruction 300 . Flow ends at block 824 .
- FIG. 9 a flowchart illustrating operation of the microprocessor 100 to execute the MTTR instruction 400 according to the present invention is shown. Flow begins a block 902 .
- the instruction issuer 208 of FIG. 2 issues an MTTR instruction 400 to the execution units 212 .
- Flow proceeds to decision block 903 .
- the execution unit 212 examines the TKSU bits 589 of the TCStatus Register 508 to determine whether the privilege level of the issuing thread context 104 is at kernel privilege level. If so, flow proceeds to decision block 904 ; otherwise, flow proceeds to block 905 .
- the execution unit 212 raises an exception to the MTTR instruction 400 since the issuing thread context 104 does not have sufficient privilege level to execute the MTTR instruction 400 .
- the execution unit 212 determines whether the target thread context 104 is halted by examining the value of the H bit 599 of the TCHalt Register 509 of FIG. 5K . If the target thread context 104 is halted, flow proceeds to decision block 908 ; otherwise flow proceeds to block 916 .
- the execution unit 212 examines the TargTC 332 value of the issuing VPE 102 VPEControl Register 504 to determine whether the TargTC 332 value is valid. In one embodiment, the TargTC 332 value is not valid if the issuing VPE is not the master VPE 102 , as indicated by a clear value in the MVP bit 553 of the VPEConf 0 Register 505 of FIG. 5F . In one embodiment, the TargTC 332 value is not valid if the thread context 104 specified by TargTC 332 is not instantiated. If the TargTC 332 value is valid, flow proceeds to decision block 912 ; otherwise, flow proceeds to block 916 .
- the execution unit 212 examines the TCU bits 581 in the TCStatus Register 508 of FIG. 5J to determine whether the MTTR instruction 400 references a coprocessor, and if so, whether the coprocessor is bound to and accessible by the target thread context 104 specified by the TargTC 332 value. If the MTTR instruction 400 references a coprocessor, and the coprocessor is not bound to and accessible by the target thread context 104 specified by the TargTC 332 value, flow proceeds to block 916 ; otherwise, flow proceeds to decision block 914 .
- the execution unit 212 determines whether the destination register 422 specified by the MTTR instruction 400 is instantiated. If so, flow proceeds to block 924 ; otherwise, flow proceeds to block 916 .
- the microprocessor 100 performs no operation because there is no valid destination register to which the source data may be written. Flow ends at block 916 .
- the execution unit 212 copies the contents of the source register 424 of the issuing thread context 104 to the destination register 422 of the target thread context 104 . Flow ends at block 924 .
- FIG. 10 a flowchart illustrating a method for performing an inter-processor interrupt (IPI) from one thread context 104 to another thread context 104 within a VPE 102 of the microprocessor 100 of FIG. 1 according to the present invention is shown.
- the steps of the flowchart substantially correlate to the source code listing included in the computer program listing appendix, and reference is made within the description of FIG. 10 to the source code listing.
- the source code listing is for a version of the Linux SMP operating system modified to view each thread context 104 of the microprocessor 100 as a separate processor, or CPU, which is referred to herein as symmetric multi-thread context (SMTC) Linux.
- the source code listing includes two C language functions (smtc_send_ipi and post_direct_ipi), one assembly language routine (smtc_ipi_vector), and one assembly language macro (CLI).
- Thread A running on thread context A 104 directs a software-emulated inter-processor interrupt (IPI) to thread context B 104 , by employing MFTR instructions 300 and MTTR instructions 400 .
- IPI inter-processor interrupt
- thread context A 104 and thread context B 104 are bound to the same VPE 102 .
- FIG. 10 illustrates only an intra-VPE IPI, the source code listing also includes instructions at lines 23 - 28 for directing a cross-VPE IPI, or inter-VPE IPI.
- a first thread context 104 is said to direct an inter-VPE IPI to a second thread context 104 if the second thread context 104 is bound to a different VPE 102 than the first thread context 104 .
- the code performs an inter-VPE IPI by placing an IPI message on a queue associated with the target thread context 104 .
- the message specifies the target thread context 104 .
- the message specified the target thread context 104 implicitly by being on the queue associated with the target thread context 104 .
- the operating system samples the queue and drains it each time the operating system performs a context switch and returns from exception.
- the code After queuing the message, the code issues a MIPS PRA asynchronous software interrupt to the target VPE 102 (i.e., to the VPE 102 to which the target thread context 104 is bound) by executing an MTTR instruction 400 (within the write_vpe_c 0 _cause routine) to set one of the software interrupt bits in the MIPS PRA Cause Register 536 of FIG. 5P of the target VPE 102 , which will cause the queue to be sampled and drained.
- an MTTR instruction 400 within the write_vpe_c 0 _cause routine
- the selected thread context 104 selected by the target VPE 102 to service the software interrupt is the target of the IPI, then the selected thread context 104 will service the IPI directly; otherwise, the selected thread context 104 will direct an intra-VPE IPI to the target thread context 104 in a manner similar to the operation described in the flowchart of FIG. 10 .
- the VPE 102 when an asynchronous hardware interrupt (such as a periodic timer interrupt used for operating system task scheduling purposes) is requested in a MIPS MT ASE processor, the VPE 102 that received the hardware interrupt request selects an eligible thread context (in this example, thread context A 104 ) to handle the exception.
- an eligible thread context in this example, thread context A 104
- control is transferred to a general exception vector of the operating system.
- the general exception vector decodes the cause of the exception and invokes the appropriate interrupt request handler (in this example, thread A), such as the timer handler.
- each VPE 102 includes one timer in Coprocessor 0 shared by all thread contexts 104 bound to the VPE 102 (see the Count/Compare register pairs described in MIPS32® Architecture for Programmers Volume III: The MIPS32® Privileged Resource Architecture, Document Number: MD00090, Revision 2.50, Jul. 1, 2005).
- only one of the timers of one of the VPEs 102 is invoked as the single timer for all CPUs of the SMP system. In another embodiment, the timer of each of the VPEs 102 is invoked for all CPUs of that VPE 102 .
- the thread context 104 selected to service the asynchronous timer interrupt executes the system clock interrupt function and then broadcasts, or directs, an IPI to all the other thread contexts 104 of the VPE 102 .
- the directed IPI is a local clock interrupt type IPI which instructs the receiving thread contexts 104 to execute only the local clock interrupt function.
- FIG. 10 Although the SMTC Linux timer interrupt handler directs an IPI message to each thread context 104 known to the operating system as a processor, the flowchart of FIG. 10 only illustrates directing an IPI to one thread context 104 , which is thread context B 104 in this example. The operation of the microprocessor 100 in response to a timer interrupt to perform preemptive task scheduling is described in more detail in FIG. 11 .
- Flow begins at block 1002 .
- thread A running on thread context A 104 halts thread B running on thread context B 104 by executing an MTTR instruction 400 instruction to clear the H bit 599 of the TCHalt Register 509 of FIG. 5K .
- the C language function write_tc_c 0 _tchalt includes the MTTR instruction 400 .
- the function settc at line 36 populates the TargTC field 332 of the VPEControl Register 504 of FIG. 5E with the thread context 104 identifier of the specified thread context 104 (in the example, thread context B 104 ) for the benefit of the MTTR instruction 400 of the write_tc_c 0 _tchalt function.
- Flow proceeds to block 1004 .
- thread A creates a new stack frame on the kernel stack of thread context B 104 .
- the new stack frame is effectively created by the assignment of a value to the kernel stack pointer of thread context B 104 , and storing values on the new stack frame comprises storing values at predetermined offsets from the kernel stack pointer value. It is also noted that if the target thread context 104 is exempted from taking interrupts (as indicated by a set IXMT bit 518 of FIG. 5J ), the code cannot spin waiting for the target thread context 104 to become non-exempted from taking interrupts because this may lead to a deadlock condition.
- the code places the IPI message on the target thread context's 104 queue at lines 48 - 62 , in a manner similar to the inter-VPE IPI issued at line 24 ; however, in this case no inter-VPE 102 software interrupt is necessary.
- Flow proceeds to block 1006 .
- thread A reads the TCStatus Register 508 of thread context B 104 via the function read_tc_c 0 _tcstatus, which includes an MFTR instruction 300 .
- the TCStatus Register 508 includes the thread context B 104 execution privilege level and interrupt exemption status, among other things. Thread A, at line 104 , also saves the TCStatus Register 508 value to the stack frame created at block 1004 . Flow proceeds to block 1008 .
- thread A reads the restart address 549 of thread B from TCRestart register 594 of thread context B 104 via the function read_tc_c 0 _tcrestart, which includes an MFTR instruction 300 .
- Thread A at line 102 , also saves the restart address 549 to the stack frame created at block 1004 . Flow proceeds to block 1012 .
- thread A saves the address of the operating system IPI handler and a reference to an IPI message on the stack frame created at block 1004 .
- the code manipulates the target thread context B 104 and stack frame such that a common IPI handler may be invoked to support SMTC operation.
- the common IPI handler is invoked to handle both software emulated interrupts described herein and actual hardware interrupts, i.e., interrupts for which target thread context 104 B is the thread context 104 selected by the VPE 102 to handle the hardware interrupt request, such as may be invoked at block 1114 of FIG. 11 .
- Flow proceeds to block 1014 .
- thread A writes the TCStatus Register 508 of thread context B 104 via the function the function write_tc_c 0 _tcstatus, which includes an MTTR instruction 400 , to set the execution privilege level of thread context B 104 to kernel mode and disables, or exempts, thread context B 104 from receiving interrupts.
- thread A would set the EXL bit 576 in Coprocessor 0 Status Register 571 in order to emulate an exception.
- EXL 576 when EXL 576 is set, multithreading is disables on the VPE 102 , i.e., only one thread context 104 is allowed to run when EXL 576 is set.
- thread A needs thread context B 104 to run when un-halted below at block 1018 . Therefore, the setting of EXL 576 must be left up to thread context B 104 by smtc_ipi_vector at block 1022 below. Thus, until then, thread A temporarily accomplishes a similar effect to setting EXL 576 by setting IXMT 518 and TKSU 589 to kernel mode in the thread context B 104 TCStatus Register 508 . Flow proceeds to block 1016 .
- thread A writes the restart address 549 of thread B in the TCRestart register 594 of thread context B 104 via the function the function write_tc_c 0 _tcrestart, which includes an MTTR instruction 400 , with the address of smtc_ipi_vector.
- Flow proceeds to block 1018 .
- thread A un-halts, or restarts, thread context B 104 to cause smtc_ipi_vector to begin running on thread context B 104 .
- Flow proceeds to block 1022 .
- the smtc_ipi_vector sets EXL 576 , which has the effect of disabling interrupts and setting the execution privilege level to kernel mode for all thread contexts 104 bound to the VPE 102 . It is noted that at line 160 the smtc_ipi_vector disables multithreading on the VPE 102 before setting EXL 576 . Additionally, if multithreading was enabled prior to line 160 , the code restores multithreading at lines 168 - 170 .
- the smtc_ipi_vector restores the thread context B 104 pre-halted TCStatus Register 508 value that was saved at block 1006 , and in particular restores its execution privilege level and interrupt exemption state. Flow proceeds to block 1026 .
- the smtc_ipi_vector loads the EPC Register 598 with the thread context B 104 pre-halted TCRestart register 594 value saved at block 1008 . Consequently, when the standard Linux SMP return from interrupt code subsequently executes an ERET instruction at block 1036 , thread B will be restarted on thread context B 104 at the address at which it was halted at block 1002 .
- the smtc_ipi_vector effectively emulates what the microprocessor 100 hardware would do if thread context B 104 had been selected to service the asynchronous interrupt (rather than thread context A 104 ). Flow proceeds to block 1028 .
- the smtc_ipi_vector saves all of the general purpose registers 224 to the stack frame created at block 1004 . Flow proceeds to block 1032 .
- the smtc_ipi_vector sets itself to kernel mode execution privilege level and exempts itself from servicing interrupts. It is noted that this is performed only for thread context B 104 , not for the entire VPE 102 . It is noted that the CLI macro is a standard Linux macro which is modified to support SMTC by setting kernel mode execution privilege level and exempting from interrupt servicing (via the IXMT bit 518 ) only the invoking thread context 104 , rather than the entire VPE 102 (as the non-SMTC code does by clearing the IE bit 577 of the Status Register 571 of FIG. 5M ), as shown at lines 227 - 247 . Flow proceeds to block 1034 .
- the smtc_ipi_vector calls the common IPI handler (which is ipi decode, as populated at line 108 ) with the IPI message reference saved on the stack frame at block 1012 as an argument. Flow proceeds to block 1036 .
- the smtc_ipi_vector jumps to the standard operating system return from interrupt code (which in Linux SMP is ret_from_irq), which eventually executes an ERET instruction to return execution on thread context B 104 to thread B with its pre-halted execution privilege level and interrupt exemption state.
- interrupt code which in Linux SMP is ret_from_irq
- the return from interrupt code restores the EPC Register 598 with the restart address value saved at block 1008 and restores the Status Register 571 KSU bits 574 with the value saved at block 1006 .
- FIG. 1 1 a flowchart illustrating a method for performing preemptive process scheduling by a symmetric multiprocessor operating system (SMP OS), such as Linux SMP, on the microprocessor 100 of FIG. 1 according to the present invention is shown.
- Symmetric multiprocessor operating systems manage a plurality of processes, or tasks, and assign the execution of the processes to particular processors, or CPUs, of the symmetric multiprocessor system, which are thread contexts 104 in the case of microprocessor 100 .
- the preemptive SMP OS schedules the set of processes to run on the assigned thread context 104 in some time-multiplexed fashion according to the scheduling algorithm of the SMP OS.
- Flow begins at block 1102 .
- a timer generates an interrupt request to a VPE 102 , which are the exception domains of the microprocessor 100 .
- the timer interrupt request is an asynchronous hardware interrupt generated by the MIPS PRA Count/Compare register pairs of one of the VPEs 102 of microprocessor 100 , and the Count/Compare register pairs of the other VPEs 102 are all disabled.
- Flow proceeds to block 1104 .
- the interrupted VPE 102 selects an eligible thread context 104 bound to itself to service the timer interrupt request.
- a thread context 104 is eligible if its IXMT bit 518 is clear and the curVPE field 558 of the TCBind Register 556 of FIG. 5K specifies to which VPE 102 the thread context 104 is bound.
- the method for choosing the eligible thread context 104 to service an asynchronous exception is implementation-dependent and may be adapted to satisfy the particular application in which the microprocessor 100 is employed. For example, the VPE 102 may select an eligible thread context 104 in a random fashion.
- the VPE 102 may select an eligible thread context 104 in a round-robin order.
- the VPE 102 may select a thread context 104 based on the relative priorities of the thread contexts 104 , such as selecting the thread context 104 having the lowest relative instruction issue priority, or a lowest relative priority for servicing exceptions. Flow proceeds to block 1106 .
- the VPE 102 suspends execution of the threads executing on all thread contexts 104 bound to the VPE 102 except for the thread context 104 selected at block 1104 . In particular, the VPE 102 ceases to issue instructions to the execution pipeline of the threads. Flow proceeds to block 1108 .
- the VPE 102 saves the restart address of the selected thread context 104 into the EPC Register 598 , sets the EXL bit 576 of the Status Register 571 , and populates the MIPS PRA Cause register 536 , all of the VPE's 102 Coprocessor 0 VPE context 106 . Flow proceeds to block 1112 .
- the VPE 102 causes the selected thread context 104 to execute a general exception handler at the general exception vector according to the MIPS PRA.
- the general exception handler decodes the cause of the exception via the MIPS PRA Cause register 536 and Status Register 571 and determines the exception was an asynchronous hardware interrupt generated by the timer. Consequently, the general exception handler calls the timer interrupt service routine, which among other functions, schedules processes according to the preemptive multitasking algorithm of the operating system. In one embodiment, the timer interrupt routine may call a separate routine dedicated to scheduling processes. Flow proceeds to block 1114 .
- the timer interrupt service routine determines whether a new process, or task, should be scheduled on the selected thread context 104 according to the SMP OS multitasking scheduling algorithm. If so, the timer interrupt service routine schedules a new process to run on the selected thread context 104 ; otherwise, the timer interrupt service routine leaves the current process to run on the selected thread context 104 .
- a thread and a process herein are not necessarily synonymous.
- a process is an entity managed by the SMP operating system, and typically comprises entire programs, such as application programs or portions of the operating system itself; whereas a thread is simply a stream of instructions, which of course may be a stream of instructions of an operating system process, or task. Flow proceeds to block 1116 .
- the timer interrupt service routine issues a software-emulated inter-processor interrupt to each other thread context 104 in the microprocessor 100 , according to FIG. 10 and/or the source code listing.
- the timer interrupt service routine performs a software-emulated inter-processor interrupt to the target thread context 104 according to FIG.
- the timer interrupt service routine places the timer interrupt service IPI message on the target thread context's 104 queue at lines 48 - 62 of the source code; and if the target thread context 104 is bound to a different VPE 102 as the selected thread context 104 , then the timer interrupt service routine will place an IPI message on a queue associated with the target thread context 104 and issue a MIPS PRA asynchronous software interrupt to the target VPE 102 , i.e., to the VPE 102 to which the target thread context 104 is bound, according to lines 23 - 28 of the source code, which will cause the queue to be sampled and drained.
- the timer interrupt service routine calls the operating system return from interrupt code, which executes an ERET instruction. If a new process was scheduled to run at block 114 , then the ERET causes the newly scheduled process to run; otherwise, the ERET causes the process that was interrupted by the timer interrupt request to continue running. Flow proceeds to block 1122 .
- each thread context 104 that was the target of a software-emulated inter-processor interrupt performed at block 1116 eventually calls the inter-processor interrupt service routine, according to block 1034 of FIG. 10 , after performing the other steps of FIG. 10 .
- the inter-processor interrupt service routine calls the timer interrupt service routine, which schedules a new process to run on the thread context 104 , if appropriate, similar to the manner described above with respect to block 1114 .
- the inter-processor interrupt handler completes, the operating system return from interrupt code is called, which executes an ERET instruction, according to block 1036 of FIG. 10 .
- the newly scheduled process will run on the thread context 104 when the return from interrupt code executes the ERET at block 1036 of FIG. 10 , rather than thread B, i.e., rather than the thread that was halted by the software-emulated directed inter-processor interrupt. If so, thread B will eventually be scheduled to run again so that it may complete. If the timer interrupt service routine did not schedule a new process to run on the thread context 104 , then thread B will continue running when the ERET is executed. Flow ends at block 1122 .
- the software emulation of directed exceptions described according to FIG. 10 enables the SMP OS to treat each thread context as an operating system level CPU, in particular with regard to preemptive process scheduling.
- the multiprocessor system 1200 comprises a plurality of CPUs, denoted CPU 0 through CPU 3 .
- Each of the CPUs is a conventional MIPS Architecture processor, i.e., without the benefit of the MIPS MT ASE.
- Each of the CPUs includes a MIPS PRA Coprocessor 0 Status register 571 , Context Register 527 , Cause Register 536 , and EntryHi Register 526 , substantially similar to those shown in FIGS.
- each of the CPUs comprises its own translation lookaside buffer (TLB) 1202 and floating point unit (FPU) 1206 .
- the FPU 1206 commonly referred to as Coprocessor 1 in the MIPS PRA, is a processing unit specifically designed for expeditiously executing floating point instructions in hardware rather than emulating execution of the floating point instruction in software.
- the TLB 1202 is a relatively small cache memory used to cache recently used virtual to physical address translations.
- the TLB 1302 is part of a memory management unit (MMU) of each CPU that enables the CPU to provide virtual memory functionality to programs executing thereon.
- MMU memory management unit
- the MIPS32® Architecture for Programmers Volume III: The MIPS32® Privileged Resource Architecture, Document Number: MD00090, Revision 2.50, Jul. 1, 2005 describes in more detail the organization and operation of the TLB 1202 and MMU.
- the TLB 1202 and Coprocessor 0 Registers are privileged resources, as are the shared TLB 1302 and shared Coprocessor 0 Registers of each VPE 102 (including the interrupt control registers) of FIG. 13 .
- the MIPS ISA includes privileged instructions (e.g., tlbr, tlbwr, tlbwi, tlbp, mfc 0 , mtc 0 ) for accessing the TLB 1202 / 1302 and Coprocessor 0 Registers (including the interrupt control registers) that may not be executed by user privilege level threads, but may only be accessed by threads with kernel privilege level; otherwise, an exception is generated.
- the operating system such as SMP Linux, maintains an ASID cache 1204 for each CPU.
- An ASID is an address space identifier, which identifies a unique memory map.
- a memory map comprises a mapping, or association, or binding, between a virtual address space and a set of physical page addresses.
- the operating system creates a memory map when it creates a new process, or task. Each process created by the operating system has a memory map. Additionally, the operating system has its own memory map. Multiple processes may share a memory map. Consequently, two CPUs using a shared memory map will result in the same virtual address accessing the same physical memory, or generating identical page fault exceptions.
- An example in a UNIX-like operating system of two processes sharing a memory map is when a process makes a fork( ) system call (not to be confused with the MIPS MT ASE FORK instruction).
- a new process is created which shares the same memory map as its parent process until such time as one of the processes performs a store to memory which would change the contents of the memory.
- a multithreaded process may have multiple threads running in the same address space using the same memory map.
- multiple processes may specifically designate particular memory pages that they share.
- a memory map comprises a simple contiguous array of page table entries, with each entry specifying a virtual to physical page address translation and other relevant page attribute information.
- a linear page table may require a significant amount of contiguous memory per process (such as in an embedded application with relatively small pages such as 4 KB pages with a relatively large address space)
- other memory map schemes may be employed.
- a multi-level page/segment table structure may be employed in which a memory map is described by a segment table which in turn points to a set of page table entries, some of which (in particular, those which correspond to unpopulated parts of the address space) may be common to multiple memory maps.
- the ASID cache 1204 is a kernel variable maintained in the system memory for each of the CPUs.
- the operating system uses the ASID cache 1204 to assign a new ASID to a newly created memory map, or to assign a new ASID for the respective CPU to an existing memory map that was previously used on another CPU.
- the operating system initializes each ASID cache 1204 value to zero.
- Each time the instance of the operating system executing on a respective CPU assigns a new ASID value from the ASID cache 1204 the operating system monotonically increments the ASID cache 1204 value of the respective CPU. This process continues until the ASID cache 1204 value wraps back to zero and the cycle continues.
- the TLB 1202 is a small cache memory in which each entry includes a tag portion and a data portion.
- the tag portion includes a virtual page address, or virtual page number (VPN), portion that is concatenated with an ASID portion.
- VPN virtual page number
- the CPU When the CPU generates a virtual memory address to make a memory access, such as when a load or store instruction is executed, the virtual memory address is concatenated with the ASID of the process making the memory access, and the result is compared with the TLB 1202 tags to see if a match occurs.
- the ASID of the process making the memory access is supplied by the ASID field 538 of the EntryHi Register 526 of FIG. 5N of the CPU executing the process.
- the operating system Each time the conventional operating system schedules a process to run on a CPU, i.e., swaps the process in to the CPU, the operating system loads the ASID identifying the memory map of the thread into the EntryHi Register 526 so that the ASID of the process making the memory access is supplied by the ASID field 538 of the EntryHi Register 526 . If a match does not occur (a TLB miss), the CPU generates a TLB miss exception, and the operating system responsively fetches the missing page address translation information from the appropriate memory map, allocates an entry in the TLB 1202 , and fills the entry with the fetched page address translation information.
- the TLB 1202 outputs the data portion of the matching entry, which includes a physical page address, or physical frame number (PFN), and attributes of the memory page.
- the TLB 1202 tag includes the ASID
- the TLB 1202 can simultaneously cache address translations for multiple memory maps. It is noted that because each CPU in the conventional system 1200 has its own ASID cache 1204 , the ASID name spaces of each of the CPUs overlap. However, this overlap of the ASID name space in the conventional system 1200 functions properly since each CPU in the system 1200 has its own TLB 1202 . However, as discussed below, the present invention modifies the operating system to employ a common ASID cache 1304 of FIG. 13 since the CPUs (thread contexts 104 ) share a common TLB 1302 in the system 100 of the present invention.
- each CPU comprises the entire architectural state of a MIPS Architecture processor, and in particular, includes all the state expected by a conventional SMP operating system, such as SMP Linux for MIPS, to be a MIPS CPU.
- the operating system views the system 1200 of FIG. 12 as having a number of CPUs equal to the number of actual full architectural state CPUs, which in FIG. 12 is four.
- the operating system views the system 100 of FIG. 13 of the present invention as having a number of CPUs equal to the number of thread contexts 104 , which in FIG. 13 is M+1, each of which is a lightweight, highly scalable set of state that comprises far less than the full architectural state of a MIPS Architecture processor.
- FIG. 13 a block diagram illustrating a multiprocessor system 100 according to the present invention is shown.
- the multiprocessor system 100 of FIG. 13 is similar to the multiprocessor system 100 of FIG. 1 ; however, the operating system running on the system 100 of FIG. 13 views each thread context 104 as a separate CPU, or processor. This is in contrast to the conventional system 1200 of FIG. 12 , and is also in contrast to a MIPS MT ASE processor-based system in which the operating system is configured to view each VPE 102 as a CPU.
- the system 100 of FIG. 13 includes a plurality of thread contexts 104 , denoted TC 0 104 through TC M 104 .
- the system 100 includes a plurality of VPEs 102 denoted VPE 0 102 through VPE N 102 .
- Each TC 104 includes a TCStatus register 508 of FIG. 5J , a TCBind register 556 of FIG. 5K , and a TCContext register 595 of FIG. 5L .
- Each VPE 102 includes a Status Register 571 of FIG. 5M , a Context register 527 of FIG. 5N , a Cause Register 536 of FIG. 5P , and an EntryHi Register 526 of FIG. 5N .
- the thread contexts 104 and VPEs 102 of the system 100 comprise more state than shown in FIG. 13 , an in particular, include all the state as described above with respect to FIGS. 1 through 11 ; however, the state shown in FIG. 13 is included for its relevance to the remaining Figures.
- the system 100 of FIG. 13 also includes a TLB 1302 , ASID cache 1304 , and FPU 1306 that are shared by all of the thread contexts 104 in the system 100 . Additionally, as described in detail above, multiple thread contexts 104 bound to a VPE 102 share interrupt control logic with the VPE's 102 exception domain. Consequently, conventional MP operating systems, such as Linux SMP, must be modified according to the present invention to accommodate the sharing of the TLB 1302 , ASID cache 1304 , interrupt control logic, and FPU 1306 by the multiple thread contexts 104 , as described herein. Embodiments are contemplated in which multiple FPU contexts 1306 are shared among the CPUs/TCs 104 .
- SMTC Linux sets the STLB bit 511 of the MVPControl Register 501 of FIG. 5B to enable all of the VPEs 102 to share the TLB 1302 .
- Other embodiments are contemplated in which a TLB 1302 is present for each VPE 102 and the TLB 1302 is shared by all of the thread contexts 104 bound to the VPE 102 . In contrast to the system 1200 of FIG.
- the ASID of the thread making the memory access is supplied by the TASID field 528 of the TCStatus Register 508 of the thread context 104 executing the thread, rather than by the ASID field 538 of the EntryHi Register 526 , since the EntryHi Register 526 of FIG. 5N is only instantiated on a per-VPE 102 basis, not a per-TC 104 basis.
- the operating system loads the ASID identifying the memory map of the thread into the TASID field 528 of the TCStatus Register 508 of the thread context 104 so that the ASID of the process making the memory access is supplied by the TASID field 528 .
- the operating system writes the ASID into the ASID field 538 of the EntryHi Register 526 , which propagates through to the TASID field 528 .
- Each of the CPUs in the system 1200 of FIG. 12 executes an instance of the Linux kernel and has a distinct value being returned from the smp_processor_id( ) function that can be used to access facilities that are instantiated for each CPU, such as local run queues and inter-processor interrupts.
- each thread context 104 in the system 100 of FIG. 13 executes an instance of the SMTC Linux kernel and has a distinct value being returned from the smp_processor_id( ) function that can be used to access facilities that are instantiated for each CPU, such as local run queues and inter-processor interrupts.
- each thread context 104 comprises a set of hardware storage elements that store sufficient state to execute a Linux thread, either a thread of the operating system or a user thread.
- the system 1200 of FIG. 12 includes one of the CPUs which is designated the first, or primary, Linux CPU that is used during the SMP Linux for MIPS boot sequence to perform low-level, system wide initialization, and contrive for all other CPUs to begin executing their instances of the Linux kernel at the SMP start_secondary( ) entry point.
- the system 100 of FIG. 13 includes one of the thread contexts 104 , namely the thread context 104 which has a value of zero in the CurTC field 557 of the TCBind Register 556 of FIG.
- each CPU/TC 104 executes an instance of the SMP Linux process scheduler which schedules the processes, or threads, to execute on the CPU/TC 104 . That is, each instance of the process scheduler determines the particular thread that will be allowed to employ the thread context 104 resources (e.g., program counter 222 , general purpose registers 224 , integer multiplier, etc) to execute the thread during a particular time slice.
- the thread context 104 resources e.g., program counter 222 , general purpose registers 224 , integer multiplier, etc
- the Linux process scheduler running on each CPU/TC 104 maintains its own run queue of threads to execute. Still further, each CPU in the system 1200 of FIG. 12 has an entry in the SMP Linux for MIPS cpu_data array, an entry 1408 of which is shown in FIG. 14 . Similarly, each thread context 104 in the system 100 of FIG. 13 has an entry 1408 in the SMTC Linux cpu_data array.
- FIG. 14 a block diagram of a cpu_data array entry 1408 in an SMTC Linux operating system according to the present invention is shown.
- the conventional SMP Linux operating system maintains a cpu_data array that includes one entry for each CPU recognized by SMP Linux.
- the array is indexed by a CPU number assigned to each individual CPU.
- Each entry stores information, referred to in FIG. 14 as original fields 1402 , about the CPU, such as the CPU type, information about the FPU 1306 , the size of the TLB 1302 , pre-emption timer-related information, and cache-related information.
- the original fields 1402 of conventional SMP Linux also include the ASID cache 1204 for each CPU, denoted asid_cache in the source code listing at line 447 .
- SMTC Linux uses the asid_cache storage space in the original fields 1402 effectively as a single ASID cache 1304 by updating each asid_cache field in each cpu_data array entry 1408 even when generating a new ASID value for only a single CPU/TC 104 .
- the SMTC Linux entry 1408 includes two additional fields: the TC_ID field 1404 and the VPE ID field 1406 .
- the TC_ID field 1404 identifies the thread context 104 of the Linux CPU associated with the cpu_data entry 1408 .
- the operating system populates the TC_ID field 1404 with the value stored in the CurTC field 557 of the TCBind Register 556 of FIG. 5K of the thread context 104 .
- the value used to index the cpu_data array is referred to as the CPU number.
- the VPE_ID field 1406 identifies the VPE 102 to which is bound the thread context 104 of the Linux CPU associated with the cpu_data entry 1408 .
- the operating system populates the VPE_ID field 1406 with the value stored in the CurVPE field 558 of the TCBind Register 556 of FIG. 5K of the thread context 104 .
- FIG. 15 a flowchart illustrating operation of the SMTC operating system on a system 100 of FIG. 13 according to the present invention is shown.
- the flowchart illustrates modifications to the conventional SMP Linux to accommodate the fact that the thread contexts 104 share common resources of the system, such as the FPU 1306 , TLB 1302 , and caches.
- Flow begins at block 1502 .
- the operating system begins its initialization sequence. Flow proceeds to block 1504 .
- the initialization sequence invokes the SMP Linux cpu_probe_( ) routine only for TC 0 104 , which corresponds to SMTC Linux CPU number 0 (the primary, or boot, CPU/TC 104 ), in order to populate the cpu_data array entry 1408 at index 0 .
- Flow proceeds to block 1506 .
- the initialization sequence copies the cpu_data array entry 1408 at index 0 to all the other entries in the cpu_data array, i.e., to the entry for each of the other CPUs/TCs 104 .
- Flow proceeds to block 1508 .
- the initialization sequence updates the TC_ID field 1404 and VPE_ID field 1406 of the cpu_data array entry 1408 for each of the CPUs/TCs 104 based on their CurTC field 557 and CurVPE field 558 values, respectively. It is noted that prior to the step at block 1508 , the binding of thread contexts 104 to VPEs 102 has been performed, i.e., the CurVPE field 558 for each thread context 104 has been populated. In one embodiment, the operating system performs the binding of thread contexts 104 to VPEs 102 . In another embodiment, the binding of thread contexts 104
- FIG. 17 three flowcharts illustrating operation of the SMTC operating system on a system 100 of FIG. 13 according to the present invention are shown.
- the flowcharts illustrate modifications to the conventional SMP Linux interrupt enable and interrupt disable routines to accommodate the fact that although each thread context 104 is a Linux CPU, the interrupt control logic is not replicated for each thread context 104 , i.e., each thread context 104 does not have its own interrupt control logic and is thus not its own exception domain; rather, each thread context's 104 exception domain is the VPE 102 to which the thread context 104 is bound, i.e., each VPE 102 comprises interrupt control logic that is a resource shared by each of the thread contexts 104 bound to the VPE 102 , as indicated by the CurVPE bits 558 of the TCBind Register 556 of FIG. 5K .
- Flow begins at block 1702 .
- the operating system begins its initialization sequence. Flow proceeds to block 1704 .
- the operating system sets the IE bit 577 in the Status Register 571 of FIG. 5M in order to enable interrupts globally for all thread contexts 104 of the VPE 102 .
- the operating system performs the step at block 1704 near the end of its initialization sequence, in particular, after each of the interrupt service routines have been set up and the operating system is ready to begin servicing interrupts. Flow ends at block 1704 .
- Flow of the second flowchart of FIG. 17 begins at block 1712 .
- a thread executing on a thread context 104 invokes an interrupt disable routine, such as the CLI macro at source code lines 215 - 250 , on a CPU/TC 104 executing the thread.
- an interrupt disable routine such as the CLI macro at source code lines 215 - 250 , on a CPU/TC 104 executing the thread.
- the interrupt disable routine sets the IXMT bit 518 of the TCStatus Register 508 of FIG. 5J of the thread context 104 executing the thread, such as is performed in the source code lines 233 - 240 .
- this disables interrupts only for the CPU/TC 104 executing the interrupt disable routine, rather than for all thread contexts 104 of the VPE 102 .
- Flow of the third flowchart of FIG. 16 begins at block 1722 .
- VPEs 102 may be performed when the microprocessor 100 is synthesized or fabricated. Additionally, the initialization sequence updates the cpu_data array entry 1408 for each of the CPUs/TCs 104 to indicate whether it has permission to access the FPU 1306 .
- the TCU 1 bit 581 of the TCStatus Register 508 of FIG. 5J indicates whether a CPU/TC 104 has permission to access the FPU 1306 . It is noted that only a single invocation of the cpu_probe( ) routine is necessary since each of the CPUs/TCs 104 share the same set of resources, namely the FPU 1306 , TLB 1302 , and caches. Flow proceeds to block 1512 .
- the initialization sequence invokes the per_cpu_trap_init( ) routine only for one thread context 104 for each VPE 102 since the VPE 102 is the exception domain for the thread contexts 104 bound to it; that is, each thread context 104 is not its own exception domain, particularly since asynchronous exceptions may not be directed specifically to a particular thread context 104 , as discussed above.
- This is in contrast to conventional SMP Linux in which the per_cpu_trap_init( ) routine is invoked once per CPU, since each CPU in the conventional system 1200 is an exception domain.
- FIG. 16 two flowcharts illustrating operation of the SMTC operating system on a system 100 of FIG. 13 according to the present invention are shown.
- the flowcharts illustrate modifications to the conventional SMP Linux to accommodate the sharing of the FPU 1306 by the thread contexts 104 of the system 100 of FIG. 13 .
- Flow begins at block 1602 .
- a thread executing on one of the thread contexts 104 includes a floating point instruction.
- the thread context 104 does not have permission to access the FPU 1306 . Therefore, a floating point exception is taken so that a floating point instruction emulation may be performed.
- Flow proceeds to block 1604 .
- the operating system increments a count associated with the thread for which the floating point emulation was performed. Flow proceeds to decision block 1606 .
- the operating system determines whether the count has exceeded a threshold parameter. If not, flow ends; otherwise, flow proceeds to block 1608 .
- the operating system sets a cpus_allowed mask, which is a kernel variable, to cause the operating system to schedule the thread on a thread context 104 that has permission to access the FPU 1306 during a subsequent time slice.
- a time slice is a time quantum used by the operating system to schedule processes or threads and is typically an integer multiple of the timer interrupt time quantum.
- Flow of the second flowchart of FIG. 16 begins at block 1612 .
- a time slice of the operating system completes and the operating system performs its thread scheduling. Flow proceeds to decision block 1614 .
- the operating system determines whether the thread executed any floating point instructions during the time slice. In one embodiment, the thread has not executed any floating point instructions during the time slice if the CU 1 bit 572 in the Status Register 571 of FIG. 5M is clear. If the thread has executed any floating point instructions during the time slice, flow ends; otherwise, flow proceeds to block 1616 .
- the operating system clears the cpus_allowed mask to enable the operating system to schedule the thread on a thread context 104 that does not have permission to access the FPU 1306 during a subsequent time slice. Flow ends at block 1616 .
- the method described in the flowcharts of FIG. 16 provides less variability in the execution times of floating-point intensive programs in an SMTC system 100 . It is noted an alternative to the operation of FIG. 16 is to allow the SMP Linux cpu_has_fpu macro to evaluate true only for one CPU/TC 104 . However, this alternative would cause extreme variability in the execution times of floating point-intensive programs, depending upon the percentage of their execution time that is scheduled by the operating system on a thread context 104 that does not have permission to access the FPU 1306 .
- a thread executing on a thread context 104 invokes an interrupt enable routine, for example a Linux STI macro, on a CPU/TC 104 executing the thread.
- an interrupt enable routine for example a Linux STI macro
- the interrupt enable routine clears the IXMT bit 518 of the TCStatus Register 508 of FIG. 5J of the thread context 104 executing the thread, similar to, but an inverse operation of, the instructions in the CLI macro.
- this enables interrupts only for the CPU/TC 104 executing the interrupt enable routine, rather than for all thread contexts 104 of the VPE 102 .
- FIG. 18 a flowchart illustrating operation of the SMTC operating system on a system 100 of FIG. 13 according to the present invention is shown.
- the flowchart of FIG. 18 illustrates modifications to the conventional SMP Linux general interrupt vector and common return from interrupt code to accommodate the fact that although each thread context 104 is a Linux CPU, each thread context 104 is not its own exception domain, but rather each thread context's 104 exception domain is the VPE 102 to which the thread context 104 is bound.
- the modifications advantageously prevent the undesirable situation in which multiple thread contexts 104 of a VPE 102 would otherwise service the same interrupt request instance.
- Flow begins at block 1802 .
- an interrupt request is activated.
- the VPE 102 receiving the interrupt request sets the EXL bit 576 in the Status Register 571 of FIG. 5M , which has the effect of disabling the VPE 102 from taking subsequent interrupts.
- Setting the EXL bit 576 also has the advantageous effect of suspending the instruction scheduler 216 from issuing for execution instructions of the various other thread contexts 104 of the VPE 102 taking the interrupt request.
- the VPE 102 selects an eligible thread context 104 to service the interrupt request and causes the general interrupt vector code to commence running on the selected thread context 104 . Flow proceeds to block 1804 .
- the interrupt vector code saves the contents of the Cause Register 536 of FIG. 5P to the TCContext Register 595 of FIG. 5L of the thread context 104 executing the interrupt vector code.
- the IP bits 547 / 548 of the Cause Register 536 of FIG. 5P indicate which interrupt request sources are currently active.
- the interrupt vector code saves the contents of the Cause Register 536 to an entry in a table similar to the page table origin or kernel stack pointer tables of FIG. 19 that is indexed by a shifted version of the TCBind Register 556 of FIG. 5K , as described below with respect to FIG. 19 .
- Flow proceeds to block 1806 .
- the interrupt vector code masks off the currently active interrupt sources indicated in the Cause Register 536 by setting the corresponding IM bits 573 in the Status Register 571 of FIG. 5M of the VPE 102 . Flow proceeds to block 1808 .
- the interrupt vector code clears the EXL bit 576 , which ceases to disable the VPE 102 from taking interrupts which were activated at block 1802 . Flow proceeds to block 1812 .
- the interrupt vector code decodes the interrupt sources based on the Cause Register 536 contents and transfers control to the appropriate interrupt handlers registered to handle interrupts for the specific types of active interrupt sources. Flow proceeds to block 1814 .
- the interrupt source-specific interrupt handler clears the interrupt source and services the interrupt source. Flow proceeds to block 1816 .
- the interrupt handler invokes the common return from interrupt code to restore the context and return from the interrupt. Flow proceeds to block 1818 .
- the return from interrupt code reads the TCContext Register 595 and unmasks the interrupt sources indicated therein as previously having been inactive by clearing the corresponding IM bits 573 in the Status Register 571 . Flow ends at block 1818 .
- a kernel variable in memory could be used instead of the TCContext Register 595 to save the Cause Register 536 contents.
- using the TCContext Register 595 is more efficient, and is particularly appropriate in an embodiment in which the value must be saved and restored on a context switch.
- SMTC Linux also provides an SMTC-specific setup_irq( ) routine that SMTC-aware device drivers may invoke to set up their interrupt handlers by passing an additional mask parameter that specifies interrupt sources that the interrupt handler will re-enable explicitly during the servicing of the exception.
- the clock timer device driver in SMTC Linux is SMTC-aware and invokes the SMTC-specific setup_irq( ) routine.
- FIG. 19 two flowcharts and two block diagrams illustrating operation of the SMTC operating system on a system 100 of FIG. 13 according to the present invention are shown.
- the flowcharts and block diagrams of FIG. 19 illustrate modifications to the conventional SMP Linux TLB miss handler, get_kernel_sp( ), and set_kernel_sp( ) routines, to accommodate the fact that the Context Register 527 of FIG. 5N , used by the conventional SMP Linux TLB miss handler get_kernel_sp( ), and set_kernel_sp( ) routines, is instantiated on a per-VPE 102 basis, rather than a per-TC 104 basis.
- Flow begins at block 1902 .
- the VPE 102 invokes the operating system TLB miss handler in response to a TLB miss exception.
- the operating system is responsible for handling TLB 1302 misses. That is, the operating system is responsible for updating the TLB 1302 with the appropriate virtual to physical page translation information if the information is missing in the TLB 1302 . This is in contrast to some processor architectures in which the processor hardware automatically fills the TLB on a TLB miss. Flow proceeds to block 1904 .
- the TLB miss handler reads the TCBind Register 556 of FIG. 5K of the exception causing thread context 104 (which the VPE 102 selects to service the TLB miss exception) and shifts the value right by 19 bits (or 18 bits if dealing with 64-bit quantities) to obtain an offset into a table of 32-bit page table origin values, or page table base address values, and adds the offset to the base address of the table to obtain a pointer to the page table origin of the thread context 104 executing the thread that caused the TLB miss exception, as shown in the corresponding block diagram.
- the base address of the table is fixed at compile time of the operating system. Flow ends at block 1904 .
- Flow of the second flowchart of FIG. 19 begins at block 1912 .
- a thread invokes the operating system get_kernel_sp( ) or set_kernel_sp( ) routine to get or set, respectively, the kernel stack pointer value for the CPU/TC 104 executing the thread.
- Flow proceeds to block 1914 .
- the invoked routine reads the TCBind Register 556 of FIG. 5K of the invoking thread context 104 and shifts the value right by 19 bits (or 18 bits if dealing with 64-bit quantities) to obtain an offset into a table of 32-bit kernel stack pointer values, and adds the offset to the base address of the table to obtain a pointer to the kernel stack pointer, as shown in the corresponding block diagram.
- the base address of the table is fixed at compile time of the operating system. Flow ends at block 1914 .
- conventional SMP Linux for MIPS uses the PTEBase field 542 of the Coprocessor 0 Context Register 527 of FIG. 5N to store a value that may be used as a pointer to CPU-unique values in a system such as the system 1200 of FIG. 12 .
- SMTC operating systems require a per-TC storage location such as TCBind 556 which is provided in system 100 of FIG. 13 for each thread context 104 , rather than a per-VPE 102 storage location, since SMTC operating systems view each thread context 104 as a CPU.
- FIG. 20 a flowchart illustrating operation of the SMTC operating system on a system 100 of FIG. 13 according to the present invention is shown.
- the flowchart illustrates modifications to the conventional SMP Linux to accommodate the fact that the thread contexts 104 share a common TLB 1302 .
- TLB 1302 maintenance routines may read and write entries in the shared TLB 1302 ; therefore, the operating system prevents multiple CPU/TCs 104 from maintaining the shared TLB 1302 at the same time.
- the second-level TLB page fault handler performs a TLB probe and re-write sequence and may be invoked at any time due to a user-mode access. Consequently, a software spin-lock is an insufficient arbiter of access to the TLB 1302 management resources.
- Flow begins at block 2002 .
- a thread executing on a CPU/TC 104 invokes a TLB 1302 maintenance routine. Flow proceeds to block 2004 .
- the routine disables interrupts. In one embodiment, the routine disables interrupts only on the executing thread context 104 , such as via a CLI described above. In another embodiment, the routine disables interrupts on the entire VPE 102 to which the thread context 104 is bound by clearing the IE bit 577 of the Status Register 571 of FIG. 5M to disable VPE 102 interrupts. Flow proceeds to block 2006 .
- the routine inhibits multi-VPE 102 operation, i.e., inhibits concurrent execution of threads other than the thread executing the routine. That is, the routine prevents the instruction scheduler 216 from dispatching to the execution units 212 instructions from any of the VPEs 102 of the system 100 other than the VPE 102 to which the thread context 104 executing the routine is bound and from dispatching from any of the thread contexts 104 bound to the VPE 102 except the thread context 104 executing the routine. In one embodiment, the routine executes a MIPS MT ASE DVPE instruction to disable multi-VPE operation. Flow proceeds to block 2008 .
- the routine performs the specified TLB 1302 maintenance required by the TLB 1302 maintenance routine. Flow proceeds to block 2012 .
- the routine restores the multi-VPE operation state that existed on the system 100 prior to performing the step at block 2006 .
- the routine executes a MIPS MT ASE EVPE instruction to enable multi-VPE operation if that was the previous state. Flow proceeds to block 2014 .
- the routine restores the interrupt enable state that existed on the VPE 102 prior to performing the step at block 2004 .
- the routine clears the IXMT bit 518 in the TCStatus Register 508 of FIG. 5J to enable interrupts for the thread context 104 if that was the previous state.
- the routine sets the IE bit 577 in the Status Register 571 of FIG. 5M to enable VPE 102 interrupts if that was the previous state. Flow ends at block 2014 .
- FIG. 21 a flowchart illustrating operation of the SMTC operating system on a system 100 of FIG. 13 according to the present invention is shown.
- the flowchart illustrates modifications to conventional SMP Linux to accommodate the fact that the thread contexts 104 share a common ASID cache 1304 .
- each CPU has its own TLB 1202 and its own ASID cache 1204 ; however, in an SMTC Linux system 100 , all of the CPUs/TCs 104 share a common TLB 1302 . Therefore, SMTC Linux must ensure that the same ASID is not assigned to two different memory maps concurrently in use on two different CPUs/TCs 104 . Otherwise, the shared TLB 1302 might return the incorrect address translation information for the thread executing on one of the CPUs/TCs 104 . This is because, as discussed above, the tags in the TLB 1302 are a concatenation of the ASID and the virtual page number being accessed.
- SMTC Linux shares a common ASID cache 1304 across all CPUs/TCs 104 , and serializes use and update of the shared ASID cache 1304 by suspending thread scheduling during the read-modify-write operation of the ASID cache 1304 that is performed when obtaining a new ASID value from the ASID cache 1304 .
- Flow begins at block 2102 .
- a thread executing on a thread context 104 requires a new ASID for a memory map for a particular CPU/TC 104 .
- the most common situations in which a new ASID is required for a memory map are when a new memory map is being created or when an ASID generation rollover occurs, as described below.
- a thread is being scheduled to run on a CPU/TC 104 , i.e., the thread is being swapped in to the CPU/TC 104 by the operating system.
- the operating system loads the general purpose registers 224 of FIG. 2 with the previously saved or initial GPR 224 values and loads the program counter 222 of FIG. 2 of the CPU/TC 104 with the previously saved or initial address of the thread.
- the operating system looks at which process is associated with the thread being schedule and which memory map is associated with the process.
- the operating system data structure describing the memory map contains an array of ASID values. Normally, the operating system takes the ASID value from the data structure entry indexed by the CPU number of the CPU/TC 104 scheduling the thread and loads the ASID value into the EntryHi Register 526 of FIG. 5N . However, if the operating system detects that the ASID value obtained from the data structure entry belongs to a previous generation, then the operating system obtains a new ASID for the memory map for the CPU/TC 104 according to FIG. 21 , and programs the EntryHi Register 526 with the new ASID instead of the ASID obtained from the data structure. Flow proceeds to block 2104 .
- the operating system gains exclusive access to the shared ASID cache 1304 .
- the step at block 2104 is performed by disabling interrupts and disabling multi-VPE operation as described with respect to blocks 2004 and 2006 , respectively, of FIG. 20 .
- An example of the step performed at block 2104 is found at lines 274 - 281 of the source code listing. Flow proceeds to block 2106 .
- the operating system increments the current ASID cache 1304 value to obtain the new ASID value.
- An example of the step performed at block 2106 is found at lines 282 and 285 of the source code listing. Flow proceeds to decision block 2108 .
- the operating system determines whether the ASID cache 1304 value rolled over to a new generation when it was incremented at block 2106 .
- the ASID cache 1304 rolls over to a new generation as follows.
- the ASID cache 1304 value is maintained as a 32-bit value.
- the TASID bits 528 of the TCStatus Register 508 of FIG. 5J and the ASID bits 538 of the Coprocessor 0 EntryHi Register 526 of FIG. 5N are physically only 8 bits.
- the operating system updates a live ASID table.
- the operating system updates the new ASID to the first ASID generation value and flushes the shared TLB 1302 .
- a live ASID is an ASID that is in use by another CPU/TC 104 .
- the live ASID table indicates, for each ASID, which CPUs/TCs 104 , if any, are currently using the ASID.
- the operating system updates the live ASID table by reading the TASID field 528 of the TCStatus Register 508 of FIG. 5J to determine the ASID currently being used by each CPU/TC 104 , which may advantageously be performed by a series of MFTR instructions 300 in the operating system thread that updates the live ASID table.
- the operating system avoids obtaining a new ASID that is the same as a live ASID in order to avoid potentially using the same physical ASID value to identify two different memory maps, which might cause the TLB 1302 to produce incorrect page translations, as discussed above.
- the operating system flushes the shared TLB 1302 when an ASID generation rollover occurs, the TASID field 528 of the TCStatus Register 508 of the various thread contexts 104 may still be populated with old generation ASIDs, and could therefore generate new TLB 1302 entry allocations/fills that have old generation ASIDs in their tags.
- An example of the step of updating the live ASID table performed at block 2112 is found at lines 304 - 305 of the source code listing.
- An example of the step of updating the new ASID to the first ASID generation value performed at block 2112 is found at line 310 of the source code listing.
- An example of the step of flushing the shared TLB 1302 performed at block 2112 is found at line 311 of the source code listing. Flow proceeds to block 2116 .
- the operating system determines whether the new ASID is equal to a live ASID. An example of the step performed at decision block 2114 is found at line 313 of the source code listing. If the new ASID is equal to a live ASID, flow returns to block 2106 so that the operating system can attempt to obtain a new non-live ASID; otherwise, flow proceeds to block 2116 .
- the operating system assigns the new ASID to the memory map for all CPUs/TCs 104 in the system 100 .
- SMTC Linux uses the asid_cache storage space in the original fields 1402 effectively as a single ASID cache 1304 by updating each asid_cache field in each cpu_data array entry 1408 even when generating a new ASID value for only a single CPU/TC 104 ; however, other embodiments are contemplated in which a single kernel variable is used to store the single ASID cache 1304 .
- the operating system advantageously assigns the new ASID to the memory map for all CPUs/TCs 104 in order to make more efficient use of the shared TLB 1302 , i.e., to avoid the following situation. Assume two processes share a common memory map and execute on different CPUs/TCs 104 . In a conventional SMP Linux system 1200 , the memory map would be assigned a different ASID for each CPU, since each CPU has its own ASID cache 1204 .
- the operating system would allocate an entry in the shared TLB 1302 for the page translation since the ASID value differed for each CPU/TC 104 , i.e., two TLB 1302 entries would be consumed for the same shared physical page, which would be an inefficient use of the shared TLB 1302 entries.
- a similar inefficiency could occur when a process was migrated from one CPU/TC 104 to another.
- SMTC Linux assigns the new ASID to the memory map not only for the CPU/TC 104 for which it was obtained, but also causes the new ASID to be assigned to and used by all CPUs/TCs 104 that reference the memory map.
- the operating system assigns a new ASID to a memory map, if a process uses the memory map, then all threads of the process which use the memory map use the new ASID on all CPUs/TCs 104 that execute the threads.
- any TLB 1302 entries that were loaded as a result of the thread executing on one CPU/TC 104 will be valid and usable on any other CPU/TC 104 to which the thread subsequently migrates, which would not be the case if the operating system maintained a distinct ASID cache per CPU, as in conventional SMP Linux.
- An example of the step performed at block 2116 is found at line 320 of the source code listing. Flow proceeds to block 2118 .
- the operating system relinquishes exclusive access to the shared ASID cache 1304 .
- the step at block 2118 is performed by restoring interrupts and multi-VPE operation to their previous states, as described with respect to blocks 2012 and 2014 , respectively, of FIG. 20 .
- An example of the step performed at block 2118 is found at lines 324 - 329 of the source code listing. Flow ends at block 2118 .
- the modified SMP OS is Linux
- other SMP operating systems are contemplated for adaptation to run on a multithreading microprocessor having non-independent lightweight thread contexts that share processor state with one another, such as MIPS MT ASE thread contexts, each of which is an independent CPU to the SMP OS.
- MIPS MT ASE thread contexts each of which is an independent CPU to the SMP OS.
- other variants of the UNIX operating system such as SUN Solaris, HP UX, Mac OS X, Open VMS, and others may be adapted to view the thread contexts as a CPU.
- SMP operating systems such as SMP-capable variants of the Microsoft Windows operating system may be adapted to view the thread contexts as a CPU.
- SMP-capable variants of the Microsoft Windows operating system may be adapted to view the thread contexts as a CPU.
- the invention has been described with respect to modifications to an existing SMP operating system, the invention is not limited to existing operating systems, but rather new operating systems may be developed which employ the steps described to employ non-independent lightweight thread contexts that share processor state with one another, such as MIPS MT ASE thread contexts, as independent CPUs to the new SMP OS.
- implementations may also be embodied in software (e.g., computer readable code, program code, instructions and/or data disposed in any form, such as source, object or machine language) disposed, for example, in a computer usable (e.g., readable) medium configured to store the software.
- software e.g., computer readable code, program code, instructions and/or data disposed in any form, such as source, object or machine language
- a computer usable (e.g., readable) medium configured to store the software.
- Such software can enable, for example, the function, fabrication, modeling, simulation, description and/or testing of the apparatus and methods described herein.
- this can be accomplished through the use of general programming languages (e.g., C, C++), GDSII databases, hardware description languages (HDL) including Verilog HDL, VHDL, and so on, or other available programs and databases.
- Such software can be disposed in any known computer usable medium such as semiconductor, magnetic disk, or optical disc (e.g., CD-ROM, DVD-ROM, etc.).
- the software can also be disposed as a computer data signal embodied in a computer usable (e.g., readable) transmission medium (e.g., carrier wave or any other medium including digital, optical, or analog-based medium).
- Embodiments of the present invention may include methods of providing operating system software described herein by providing the software and subsequently transmitting the software as a computer data signal over a communication network including the Internet and intranets, such as shown in FIGS. 22 through 24 .
- the apparatus and method described herein may be included in a semiconductor intellectual property core, such as a microprocessor core (e.g., embodied in HDL) and transformed to hardware in the production of integrated circuits.
- the apparatus and methods described herein may be embodied as a combination of hardware and software.
- the present invention should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
Description
- This application is a continuation-in-part (CIP) of the following co-pending Non-Provisional U.S. Patent Applications, which are hereby incorporated by reference in their entirety for all purposes:
Ser. No. (Docket No.) Filing Date Title 11/313,272 Dec. 20, 2005 SOFTWARE EMULATION OF (MIPS.0214-00-US) DIRECTED EXCEPTIONS IN A MULTITHREADING PROCESSOR 11/313,296 Dec. 20, 2005 PREEMPTIVE MULTITASKING (MIPS.0214-01-US) EMPLOYING SOFTWARE EMULATION OF DIRECTED EXCEPTIONS IN A MULTITHREADING PROCESSOR - Each of the two above co-pending Non-Provisional U.S. Patent Applications is a continuation-in-part (CIP) of the following co-pending Non-Provisional U.S. Patent Application, which is hereby incorporated by reference in its entirety for all purposes:
Ser. No. (Docket No.) Filing Date Title 10/929,097 Aug. 27, 2004 APPARATUS, METHOD, AND (MIPS.0194-00-US) INSTRUCTION FOR SOFTWARE MANAGEMENT OF MULTIPLE COMPUTATIONAL CONTEXTS IN A MULTITHREADED MICROPROCESSOR - Co-pending Non-Provisional U.S. patent application Ser. No. 10/929,097 (MIPS.0194-00-US) is a continuation-in-part (CIP) of the following co-pending Non-Provisional U.S. Patent Applications, which are hereby incorporated by reference in their entirety for all purposes:
Ser. No. (Docket No.) Filing Date Title 10/684,350 Oct. 10, 2003 MECHANISMS FOR ASSURING (MIPS.0188-01-US) QUALITY OF SERVICE FOR PROGRAMS EXECUTING ON A MULTITHREADED PROCESSOR 10/684,348 Oct. 10, 2003 INTEGRATED MECHANISM (MIPS.0189-00-US) FOR SUSPENSION AND DEALLOCATION OF COMPUTATIONAL THREADS OF EXECUTION IN A PROCESSOR - Each of co-pending Non-Provisional U.S. patent application Ser. No. 10/684,350 (MIPS.0188-01-US) and Ser. No. 10/684,348 (MIPS.0189-00-US) claims the benefit of the following expired U.S. Provisional Applications, which are hereby incorporated by reference in their entirety for all purposes:
Ser. No. (Docket No.) Filing Date Title 60/499,180 Aug. 28, 2003 MULTITHREADING (MIPS.0188-00-US) APPLICATION SPECIFIC EXTENSION 60/502,358 Sep. 12, 2003 MULTITHREADING (MIPS.0188-02-US) APPLICATION SPECIFIC EXTENSION TO A PROCESSOR ARCHITECTURE 60/502,359 Sep. 12, 2003 MULTITHREADING (MIPS.0188-03-US) APPLICATION SPECIFIC EXTENSION TO A PROCESSOR ARCHITECTURE - A computer program listing appendix, which is hereby incorporated by reference in its entirety for all purposes, is submitted via the USPTO electronic filing system (EFS) in a text file named cpl-mips-0214-03-US.txt that contains a 665 line computer program listing of C language and assembly language source code.
- The present invention relates in general to the field of multithreaded microprocessors, and particularly to execution of multiprocessor operating systems thereon.
- Microprocessor designers employ many techniques to increase microprocessor performance. Most microprocessors operate using a clock signal running at a fixed frequency. Each clock cycle the circuits of the microprocessor perform their respective functions. According to Hennessy and Patterson (see Computer Architecture: A Quantitative Approach, 3rd Edition), the true measure of a microprocessor's performance is the time required to execute a program or collection of programs. From this perspective, the performance of a microprocessor is a function of its clock frequency, the average number of clock cycles required to execute an instruction (or alternately stated, the average number of instructions executed per clock cycle), and the number of instructions executed in the program or collection of programs. Semiconductor scientists and engineers are continually making it possible for microprocessors to run at faster clock frequencies, chiefly by reducing transistor size, resulting in faster switching times. The number of instructions executed is largely fixed by the task to be performed by the program, although it is also affected by the instruction set architecture of the microprocessor. Large performance increases have been realized by architectural and organizational notions that improve the instructions per clock cycle, in particular by notions of parallelism.
- One notion of parallelism that has improved the clock frequency of microprocessors is pipelining, which overlaps execution of multiple instructions within pipeline stages of the microprocessor. In an ideal situation, each clock cycle one instruction moves down the pipeline to a new stage, which performs a different function on the instruction. Thus, although each individual instruction takes multiple clock cycles to complete, the multiple cycles of the individual instructions overlap. Because the circuitry of each individual pipeline stage is only required to perform a small function relative to the sum of the functions required to be performed by a non-pipelined processor, the clock cycle of the pipelined processor may be reduced. The performance improvements of pipelining may be realized to the extent that the instructions in the program permit it, namely to the extent that an instruction does not depend upon its predecessors in order to execute and can therefore execute in parallel with its predecessors, which is commonly referred to as instruction-level parallelism. Another way in which instruction-level parallelism is exploited by contemporary microprocessors is the issuing of multiple instructions for execution per clock cycle. These microprocessors are commonly referred to as superscalar microprocessors.
- What has been discussed above pertains to parallelism at the individual instruction-level. However, the performance improvement that may be achieved through exploitation of instruction-level parallelism is limited. Various constraints imposed by limited instruction-level parallelism and other performance-constraining issues have recently renewed an interest in exploiting parallelism at the level of blocks, or sequences, or streams of instructions, commonly referred to as thread-level parallelism. A thread is simply a sequence, or stream, of program instructions. A multithreaded microprocessor concurrently executes multiple threads according to some scheduling policy that dictates the fetching and issuing of instructions of the various threads, such as interleaved, blocked, or simultaneous multithreading. A multithreaded microprocessor typically allows the multiple threads to share the functional units of the microprocessor (e.g., instruction fetch and decode units, caches, branch prediction units, and load/store, integer, floating-point, SIMD, etc. execution units) in a concurrent fashion. However, multithreaded microprocessors include multiple sets of resources, or contexts, for storing the unique state of each thread, such as multiple program counters and general purpose register sets, to facilitate the ability to quickly switch between threads to fetch and issue instructions. In other words, because each thread context has its own program counter and general purpose register set, the multithreading microprocessor does not have to save and restore these resources when switching between threads, thereby potentially reducing the average number of clock cycles per instruction.
- One example of a performance-constraining issue addressed by multithreading microprocessors is the fact that accesses to memory outside the microprocessor that must be performed due to a cache miss typically have a relatively long latency. It is common for the memory access time of a contemporary microprocessor-based computer system to be between one and two orders of magnitude greater than the cache hit access time. Instructions dependent upon the data missing in the cache are stalled in the pipeline waiting for the data to come from memory. Consequently, some or all of the pipeline stages of a single-threaded microprocessor may be idle performing no useful work for many clock cycles. Multithreaded microprocessors may solve this problem by issuing instructions from other threads during the memory fetch latency, thereby enabling the pipeline stages to make forward progress performing useful work, somewhat analogously to, but at a finer level of granularity than, an operating system performing a task switch on a page fault. Other examples of performance-constraining issues addressed by multithreading microprocessors are pipeline stalls and their accompanying idle cycles due to a data dependence; or due to a long latency instruction such as a divide instruction, floating-point instruction, or the like; or due to a limited hardware resource conflict. Again, the ability of a multithreaded microprocessor to issue instructions from independent threads to pipeline stages that would otherwise be idle may significantly reduce the time required to execute the program or collection of programs comprising the threads.
- Multiprocessing is a technique related to multithreading that exploits thread-level parallelism, albeit at a higher system level, to execute a program or collection of programs faster. In a conventional multiprocessor system, multiple processors, or CPUs, share a memory system and I/O devices. A multiprocessor (MP) operating system facilitates the simultaneous execution of a program or collection of programs on the multiprocessor system. For example, the system may include multiple Pentium IV processors all sharing a memory and I/O subsystem running an MP operating system—such as Linux SMP, an MP-capable version of Windows, Sun Solaris, etc., and executing one or more application programs concurrently.
- Multithreading microprocessors exploit thread-level parallelism at an even lower level than multiprocessor systems by sharing instruction fetch, issue, and execution resources, as described above, in addition to sharing a memory system and I/O devices. An MP operating system may run on a multithreading microprocessor if the multithreading microprocessor presents multiple processors, or CPUs, in an architected manner recognized by the MP operating system. Perhaps the most highly publicized example is the Hyper-Threading (HT) Technology employed in the Intel® Xeon® multithreading microprocessor. An HT Xeon includes effectively the same execution resources (e.g., caches, execution units, branch predictors) as a non-HT Xeon processor, but replicates the architectural state to present multiple distinct logical processors to an MP OS. That is, the MP operating system recognizes each logical processor as a separate processor, or CPU, each presenting the architecture of a single processor. The cost of replicating the architectural state for the additional logical processor in the Xeon in terms of additional chip size and power consumption is almost 5%.
- One aspect of the architecture presented by each of the multiple processors to the MP operating system is the ability to handle a list of architected exceptions. Generally speaking, an exception is an error or other unusual condition or event that occurs during the execution of a program. In response to an exception, the processor saves the state of the currently executing program and begins fetching and executing instructions at a predefined address, thereby transferring execution to an alternate program, commonly referred to as an exception handler located at the predefined address. The predefined address may be common to all exceptions in the list of architected exception types or may be unique to some or all of the exception types. The exception handler, when appropriate, may restore the state and resume execution of the previously executing program. Examples of common exceptions include a page fault, a divide by zero, a faulty address generated by the program, a bus error encountered by the processor when attempting to read a memory location, or an invalid instruction exception caused by an invalid instruction opcode or invalid instruction operand.
- Another common exception type is an interrupt, or interrupt request. Interrupts are typically grouped as hardware interrupts and software interrupts. A software interrupt is generated when the currently executing program executes an architected software interrupt instruction, which causes an exception that transfers control to the architected interrupt vector associated with the software interrupt to invoke an interrupt service routine, or handler. A hardware interrupt is a signal received by the processor from a device to request service by the processor. Examples of interrupting devices are disk drives, direct memory access controllers, and timers. In response to the interrupt request, the processor transfers control to an architected interrupt vector associated with the interrupt request to invoke an interrupt service routine, or handler.
- One function which MP operating systems need to be able to perform is for one processor, or CPU, to interrupt the operation of another specific one of the processors, and in some cases to interrupt all the processors in the system. These operations are sometimes referred to as inter-processor interrupts (IPIs). Commonly in a multiprocessor system, each processor includes an interrupt controller, which enables each processor to direct an interrupt specifically to each of the other processors. The HT Xeon processors, for example, include a replicated Advanced Programmable Interrupt Controller (APIC) for each logical processor, which enables each logical processor to send a hardware interrupt specifically to each of the other logical processors.
- An example of the use of an IPI is in preemptive time-sharing operating systems, which receive periodic timer interrupts, in response to which the operating system may perform a task switch on one or more of the processors to schedule a different task or process to execute on the processors. In Linux SMP, for example, the timer handling routine running on the processor that receives the timer interrupt not only schedules the tasks on its own processor, but also directs an interrupt to each of the other processors to cause them to schedule their tasks. Each processor has an architected interrupt mechanism, which the timer interrupt-receiving processor uses to direct an IPI to each of the other processors in the multiprocessor system.
- Another multithreading microprocessor core architecture which takes a somewhat different approach than, for example, the Intel HT architecture is the MIPS® Multithreading (MT) Application-Specific Extension (ASE) of the MIPS Instruction Set Architecture (ISA) and MIPS Privileged Resource Architecture (PRA). The MIPS MT ASE allows two distinct, but not mutually-exclusive, multithreading capabilities. A single MIPS MT ASE microprocessor core comprises one or more Virtual Processing Elements (VPEs), and each VPE comprises one or more thread contexts (TCs). This architecture is described in the document MIPS32® Architecture for Programmers Volume IV-f: The MIPS® MT Application-Specific Extension (ASE) to the MIPS32® Architecture, Document Number: MD00378, Revision 1.00, Sep. 28, 2005, available from MIPS Technologies, 1225 Charleston Road, Mountain View, Calif. 94043-1353, which is hereby incorporated by reference in its entirety for all purposes. Embodiments of the architecture are also described in the above-referenced U.S. Patent Applications.
- In the MIPS MT ASE architecture, an N-VPE processor core presents to an SMP operating system an N-way symmetric multiprocessor. In particular, it presents to the SMP operating system N MIPS32® Architecture processors. Thus, SMP operating systems configured to run on a conventional multiprocessor system having N MIPS32 processors without the MT ASE capability will run on a single MIPS32 core with the MT ASE capabilities with little or no modifications to the SMP operating system. In particular, each VPE presents an architected exception domain to the SMP operating system including an architected list of exceptions that the VPE will handle. The list includes interrupts that one VPE may direct to another specific VPE in the multithreading microprocessor, somewhat similar to the HT Xeon approach.
- As mentioned above, each VPE comprises at least one thread context, and may comprise multiple thread contexts. A thread context in the MIPS MT ASE comprises a program counter representation, a set of general purpose registers, a set of multiplier result registers, and some of the
MIPS PRA Coprocessor 0 state, such as state describing the execution privilege level and address space identifier (ASID) of each thread context. The thread contexts are relatively lightweight compared to VPEs with respect to storage elements required to store state and are therefore less expensive than VPEs in terms of chip area and power consumption. Advantageously, the lightweight feature of MIPS MT ASE thread contexts makes them inherently more scalable than VPEs, and potentially than Intel HT logical processors, for example. - In particular, in the interest of providing lightweight thread contexts and the concomitant advantages, such as improved scalability, within the MIPS MT ASE, the domain for exception handling is at the VPE level, not the thread context level. In particular, a VPE handles asynchronous exceptions, such as interrupts, opportunistically. That is, when an asynchronous exception is raised to the VPE, the VPE selects one of the eligible (i.e., not marked as exempt from servicing asynchronous exceptions) thread contexts to execute the exception handler. Thus, although there is an architected means for a thread context to direct an asynchronous exception to a VPE, the thread context cannot specify to the VPE which thread context should handle the exception within the VPE in a MIPS MT ASE processor, i.e., the exception architecture does not provide an explicit way for the thread context to direct an asynchronous exception to a specific other thread context. This is a problem, particularly with MP operating systems, such as Linux SMP, that rely on the ability of one CPU to direct an inter-processor interrupt to another CPU in response to a timer interrupt request in order to accomplish preemptive multitasked process scheduling.
- In accordance with the goal of providing high scalability of MIPS MT thread contexts, not only is the interrupt controller not replicated for each thread context, i.e., the exception domain is at the VPE level rather than at the thread context level, but other resources in a MIPS MT processor core also may not be replicated for each thread context. For example, each thread context may not have its own translation lookaside buffer (TLB) or floating point coprocessor.
- Various MP operating systems have been developed to run on multiprocessor systems in which the multiple processors are MIPS architecture processors. As mentioned above, an SMP operating system running on a conventional multiprocessor system having N non-MT MIPS processors that views the system as having N CPUs will likewise view a single MIPS32 MT N-VPE microprocessor core as having N CPUs and run with little or no modifications to the SMP operating system. However, the existing MP operating systems do not have knowledge of the MIPS MT VPE/TC hierarchy, and in particular because a MIPS MT thread context is much more lightweight than a MIPS MT VPE and does not provide all the architectural state of a VPE, the existing MP operating systems do not view a MIPS MT core having M thread contexts as an M CPU system. However, it would be advantageous to enable the MP operating systems to view a MIPS MT core having M thread contexts as an M CPU system, particularly due to the highly scalable nature of MIPS MT thread contexts to a relatively large number of threads.
- Therefore, what is needed is a means to have each lightweight thread context—to which is replicated less than the full architected CPU state anticipated by an existing MP operating system, such as a MIPS MT ASE thread context—appear as an architected CPU to the MP operating system, such as Linux SMP or other MP derivatives of UNIX-style operating systems.
- The present invention describes modifications to existing SMP operating systems that makes highly scalable, lightweight thread contexts within a multithreaded processor that would normally by themselves be unable to run an image, or instance, of the operating system, to function as a physical CPU for the purposes of the operating system resource management.
- In one aspect, the present invention provides a multiprocessing system. The system includes a multithreading microprocessor having a plurality of thread contexts (TCs), a translation lookaside buffer (TLB) shared by the plurality of TCs, and an instruction scheduler, coupled to the plurality of TCs, configured to dispatch to execution units, in a multithreaded fashion, instructions of threads executing on the plurality of TCs. The system also includes a multiprocessor operating system (OS), configured to schedule execution of the threads on the plurality of TCs, wherein a thread of the threads executing on one of the plurality of TCs is configured to update the shared TLB, and prior to updating the TLB to disable interrupts, to prevent the OS from unscheduling the TLB-updating thread from executing on the plurality of TCs, and disable the instruction scheduler from dispatching instructions from any of the plurality of TCs except from the one of the plurality of TCs on which the TLB-updating thread is executing.
- In another aspect, the present invention provides a method for a multiprocessor operating system (OS) to run on a multiprocessing system including a multithreading microprocessor having a plurality of thread contexts (TCs), a translation lookaside buffer (TLB) shared by the plurality of TCs, and an instruction scheduler configured to dispatch to execution units instructions of threads executing on the plurality of TCs in a multithreaded fashion. The method includes scheduling execution of the threads on the plurality of TCs, wherein a thread of the threads executing on one of the plurality of TCs is configured for updating the shared TLB, disabling interrupts, prior to the updating the TLB, to prevent the OS from unscheduling the TLB-updating thread from executing on the plurality of TCs, and disabling the instruction scheduler, prior to the updating the TLB, from dispatching instructions from any of the plurality of TCs except from the one of the plurality of TCs on which the TLB-updating thread is executing.
- In another aspect, the present invention provides a computer program product for use with a computing device, the computer program product including a computer usable medium, having computer readable program code embodied in the medium, for causing a method for method for a multiprocessor operating system (OS) to run on a multiprocessing system including a multithreading microprocessor having a plurality of thread contexts (TCs), a translation lookaside buffer (TLB) shared by the plurality of TCs, and an instruction scheduler configured to dispatch to execution units instructions of threads executing on the plurality of TCs in a multithreaded fashion. The computer readable program code includes first program code for providing a step of scheduling execution of the threads on the plurality of TCs, wherein a thread of the threads executing on one of the plurality of TCs is configured for updating the shared TLB, second program code for providing a step of disabling interrupts, prior to the updating the TLB, to prevent the OS from unscheduling the TLB-updating thread from executing on the plurality of TCs, and third program code for providing a step of disabling the instruction scheduler, prior to the updating the TLB, from dispatching instructions from any of the plurality of TCs except from the one of the plurality of TCs on which the TLB-updating thread is executing.
- In another aspect, the present invention provides a method for providing operating system software for running on a multiprocessing system including a multithreading microprocessor having a plurality of thread contexts (TCs), a translation lookaside buffer (TLB) shared by the plurality of TCs, and an instruction scheduler configured to dispatch to execution units instructions of threads executing on the plurality of TCs in a multithreaded fashion. The method includes providing computer-readable program code describing the operating system software. The program code includes first program code for providing a step of scheduling execution of the threads on the plurality of TCs, wherein a thread of the threads executing on one of the plurality of TCs is configured for updating the shared TLB, second program code for providing a step of disabling interrupts, prior to the updating the TLB, to prevent the OS from unscheduling the TLB-updating thread from executing on the plurality of TCs, and third program code for providing a step of disabling the instruction scheduler, prior to the updating the TLB, from dispatching instructions from any of the plurality of TCs except from the one of the plurality of TCs on which the TLB-updating thread is executing. The method also includes transmitting the computer-readable program code as a computer data signal on a network.
- An advantage of the present invention is that it allows an SMP operating system, configured as if it were running on a relatively large number of symmetric CPUs, to run on a multithreaded processor, because each “CPU” is associated with a thread context that is very lightweight in terms of chip area and power consumption and therefore highly scalable. The thread contexts are lightweight because they do not each comprise the entire architectural state associated with an independent symmetric CPU; rather, the thread contexts have some architectural state replicated to each of them (such as a program counter and general purpose register set), but also share much of the architectural state between them (such as a TLB and interrupt control logic), which requires modifications to the SMP operating system to enable the number of operating system CPUs be equal to the number of thread contexts. Consequently, an existing body of coarse-grain multithreading technology embodied in SMP operating systems, such as multithreading telematics, robotics, or multimedia applications, may be exploited on such a highly scalable processor core.
-
FIG. 1 is a block diagram illustrating a microprocessor according to the present invention. -
FIG. 2 is a block diagram illustrating in more detail the microprocessor ofFIG. 1 . -
FIG. 3 is a block diagram illustrating an MFTR instruction executed by the microprocessor ofFIG. 1 according to the present invention. -
FIG. 4 is a block diagram illustrating an MTTR instruction executed by the microprocessor ofFIG. 1 according to the present invention. -
FIG. 5 is a series of block diagrams illustrating various multithreading-related registers of the microprocessor ofFIG. 1 according to one embodiment of the present invention. -
FIG. 6 is a block diagram illustrating data paths of the microprocessor for performing the MFTR instruction according to the present invention. -
FIG. 7 is a block diagram illustrating data paths of the microprocessor for performing the MTTR instruction according to the present invention. -
FIG. 8 is a flowchart illustrating operation of the microprocessor to execute the MFTR instruction according to the present invention. -
FIG. 9 is a flowchart illustrating operation of the microprocessor to execute the MTTR instruction according to the present invention. -
FIG. 10 is a flowchart illustrating a method for performing an inter-processor interrupt (IPI) from one thread context to another thread context within a VPE of the microprocessor ofFIG. 1 according to the present invention. -
FIG. 11 is a flowchart illustrating a method for performing preemptive process scheduling by a symmetric multiprocessor operating system on the microprocessor ofFIG. 1 according to the present invention. -
FIG. 12 is a block diagram illustrating a prior art multiprocessor system. -
FIG. 13 is a block diagram illustrating a multiprocessor system according to the present invention. -
FIG. 14 is a block diagram of a cpu_data array entry in an SMTC Linux operating system according to the present invention. -
FIG. 15 is a flowchart illustrating operation of the SMTC operating system on a system ofFIG. 13 according to the present invention. -
FIG. 16 is two flowcharts illustrating operation of the SMTC operating system on a system ofFIG. 13 according to the present invention. -
FIG. 17 is three flowcharts illustrating operation of the SMTC operating system on a system ofFIG. 13 according to the present invention. -
FIG. 18 is a flowchart illustrating operation of the SMTC operating system on a system ofFIG. 13 according to the present invention. -
FIG. 19 is two flowcharts and two block diagrams illustrating operation of the SMTC operating system on a system ofFIG. 13 according to the present invention. -
FIG. 20 is a flowchart illustrating operation of the SMTC operating system on a system ofFIG. 13 according to the present invention. -
FIG. 21 is a flowchart illustrating operation of the SMTC operating system on a system ofFIG. 13 according to the present invention. -
FIGS. 22 through 24 are flowcharts illustrating a method for providing software for performing the steps of the present invention and subsequently transmitting the software as a computer data signal over a communication network. - For a better understanding of exception processing, translation lookaside buffer (TLB) operation, and floating point unit (FPU) coprocessor operation on MIPS architecture processors in general, the reader is referred to MIPS RISC Architecture, by Gerry Kane and Joe Heinrich, published by Prentice Hall, and to See MIPS Run, by Dominic Sweetman, published by Morgan Kaufman Publishers.
- Embodiments of the present invention are described herein in the context of a processor core that includes the MIPS® MT Application-Specific Extension (ASE) to the MIPS32® Architecture; however, the present invention is not limited to a processor core with said architecture. Rather, the present invention may be implemented in any processor system which includes a plurality of thread contexts for concurrently executing a corresponding plurality of threads, but which does not include an interrupt input for each of the plurality of thread contexts that would allow one thread context to direct an inter-processor interrupt specifically to another thread context.
- Referring now to
FIG. 1 , a block diagram illustrating amicroprocessor 100 according to the present invention is shown. Themicroprocessor 100 includes a virtual multiprocessor (VMP)context 108 and a plurality of virtual processing elements (VPEs) 102. EachVPE 102 includes aVPE context 106 and at least one thread context (TC) 104. TheVMP context 108 comprises a collection of storage elements, such as registers or latches, and/or bits in the storage elements of themicroprocessor 100 that describe the state of execution of themicroprocessor 100. In particular, theVMP context 108 stores state related to global resources of themicroprocessor 100 that are shared among theVPEs 102, such as theinstruction cache 202,instruction fetcher 204,instruction decoder 206,instruction issuer 208,instruction scheduler 216,execution units 212, anddata cache 242 ofFIG. 2 , or other shared elements of themicroprocessor 100 pipeline described below. In one embodiment, theVMP context 108 includes theMVPControl Register 501,MVPConf0 Register 502, andMVPConf1 Register 503 ofFIGS. 5B-5D described below. - A
thread context 104 comprises a collection of storage elements, such as registers or latches, and/or bits in the storage elements of themicroprocessor 100 that describe the state of execution of a thread, and which enable an operating system to manage the resources of thethread context 104. That is, the thread context describes the state of its respective thread, which is unique to the thread, rather than state shared with other threads of execution executing concurrently on themicroprocessor 100. A thread—also referred to herein as a thread of execution, or instruction stream—is a sequence of instructions. Themicroprocessor 100 is a multithreading microprocessor. That is, themicroprocessor 100 is configured to concurrently execute multiple threads of execution. By storing the state of each thread in themultiple thread contexts 104, themicroprocessor 100 is configured to quickly switch between threads to fetch and issue instructions. The elements of athread context 104 of various embodiments are described below with respect to the remaining Figures. Advantageously, thepresent microprocessor 100 is configured to execute theMFTR instruction 300 ofFIG. 3 and theMTTR instruction 400 ofFIG. 4 for movingthread context 104 information between thevarious thread contexts 104, as described in detail herein. - The
VPE context 106 includes a collection of storage elements, such as registers or latches, and/or bits in the storage elements of themicroprocessor 100 that describe the state of execution of aVPE 102, which enable an operating system to manage the resources of theVPE 102, such as virtual memory, caches, exceptions, and other configuration and status information. Consequently, amicroprocessor 100 withN VPEs 102 may appear to an operating system as an N-way symmetric multiprocessor. However, as also described herein, amicroprocessor 100 withM thread contexts 104 may appear to an operating system as an M-way symmetric multiprocessor, such as shown with respect toFIG. 13 . In particular, threads running on thethread contexts 104 may includeMFTR instructions 300 andMTTR instructions 400 to read and write anotherthread context 104 to emulate a directed exception, such as an inter-processor interrupt, as described herein. - The
VPEs 102 share various of themicroprocessor 100 resources, such as theinstruction cache 202,instruction fetcher 204,instruction decoder 206,instruction issuer 208,instruction scheduler 216,execution units 212, anddata cache 242 ofFIG. 2 , transparently to the operating system. In one embodiment, eachVPE 102 substantially conforms to a MIPS32 or MIPS64 Instruction Set Architecture (ISA) and a MIPS Privileged Resource Architecture (PRA), and theVPE context 106 includes theMIPS PRA Coprocessor 0 and system state necessary to describe one or more instantiations thereof. In one embodiment, theVPE context 106 includes theVPEControl Register 504,VPEConf0 Register 505,VPEConf1 Register 506,YQMask Register 591,VPESchedule Register 592, andVPEScheFBack Register 593 ofFIGS. 5E-5H andEPC Register 598,Status Register 571,EntryHi Register 526,Context Register 527, andCause Register 536 ofFIGS. 5L-5P described below. - In one respect, a
VPE 102 may be viewed as an exception domain. That is, when an asynchronous exception (such as a hardware or software interrupt) is generated, or when an instruction of one of thethread contexts 104 of aVPE 102 generates a synchronous exception (such as an address error, bus error, or invalid instruction exception), multithreading is suspended on the VPE 102 (i.e., only instructions of the instruction stream associated with thethread context 104 servicing the exception are fetched and issued), and eachVPE context 106 includes the state necessary to service the exception. Once the exception is serviced, the exception handler may selectively re-enable multithreading on theVPE 102. When an asynchronous exception such as an interrupt is raised to theVPE 102, theVPE 102 selects one of the eligible (i.e., not marked as exempt from servicing asynchronous exceptions as indicated by theIXMT bit 518 ofFIG. 5J )thread contexts 104 of theVPE 102 to execute the exception handler. (The manner used by theVPE 102 to select one of the eligible thread contexts is implementation-dependent, such as selecting pseudo-randomly, in a round-robin fashion, or based on the relative priorities of thethread contexts 104.) That is, the asynchronous exception itself does not specify whichthread context 104 of theVPE 102 is to handle the exception. Thus, themicroprocessor 100 does not provide a hardware exception mechanism for onethread context 104 to direct an asynchronous exception to anotherspecific thread context 104. Advantageously, the present invention provides a method for operating system software to emulate onethread context 104 directing an asynchronous exception to anotherspecific thread context 104, as described herein. - Referring now to
FIG. 2 , a block diagram illustrating in more detail themicroprocessor 100 ofFIG. 1 is shown. Themicroprocessor 100 is a pipelined microprocessor comprising a plurality of pipeline stages. Themicroprocessor 100 includes a plurality ofthread contexts 104 ofFIG. 1 . The embodiment ofFIG. 2 shows fourthread contexts 104; however, it should be understood that the number of fourthread contexts 104 is chosen only for illustration purposes, and themicroprocessor 100 described herein embodying the present invention is susceptible to any number ofthread contexts 104. In one embodiment, the number ofthread contexts 104 may be up to 256. Furthermore, amicroprocessor 100 may includemultiple VPEs 102, each havingmultiple thread contexts 104. In one embodiment, eachthread context 104 comprises a program counter (PC) 222 for storing an address for fetching a next instruction in the associated instruction stream, a general purpose register (GPR) set 224 for storing intermediate execution results of the instruction stream issuing from the thread context based on theprogram counter 222 value, and other per-thread context 226. In one embodiment, themicroprocessor 100 includes a multiplier unit, and theother thread context 226 includes registers for storing results of the multiplier unit specifically associated with multiply instructions in the instruction stream. In one embodiment, theother thread context 226 includes information for uniquely identifying eachthread context 104. In one embodiment, the thread identification information includes information for specifying the execution privilege level of the associated thread, such as whether the thread is a kernel, supervisor, or user level thread, such as is stored in theTKSU bits 589 of theTCStatus Register 508 ofFIG. 5J . In one embodiment, the thread identification information includes information for identifying a task or process comprising the thread. In particular, the task identification information may be used as an address space identifier (ASID) for purposes of translating physical addresses into virtual addresses, such as is stored in theTASID bits 528 of theTCStatus Register 508, which are reflected in theEntryHi Register 526 ofFIG. 5N . In one embodiment, the other per-thread context 226 includes theTCStatus Register 508,TCRestart Register 594,TCHalt Register 509,TCContext Register 595,TCSchedule Register 596,TCBind Register 556 andTCScheFBack Register 597 ofFIGS. 5J-5L . - The
microprocessor 100 includes ascheduler 216 for scheduling execution of the various threads being concurrently executed by themicroprocessor 100. Thescheduler 216 is coupled to theVMP context 108 andVPE contexts 106 ofFIG. 1 and to the other per-thread context 226. In particular, thescheduler 216 is responsible for scheduling fetching of instructions from theprogram counter 222 of thevarious thread contexts 104 and for scheduling issuing of the fetched instructions toexecution units 212 of themicroprocessor 100, as described below. Thescheduler 216 schedules execution of the threads based on a scheduling policy of themicroprocessor 100. The scheduling policy may include, but is not limited to, any of the following scheduling policies. In one embodiment, thescheduler 216 employs a round-robin, or time-division-multiplexed, or interleaved, scheduling policy that allocates a predetermined number of clock cycles or instruction issue slots to each ready thread in a rotating order. The round-robin policy is useful in an application in which fairness is important and a minimum quality of service is required for certain threads, such as real-time application program threads. In one embodiment, thescheduler 216 employs a blocking scheduling policy wherein thescheduler 216 continues to schedule fetching and issuing of a currently running thread until an event occurs that blocks further progress of the thread, such as a cache miss, a branch misprediction, a data dependency, or a long latency instruction. In one embodiment, themicroprocessor 100 comprises a superscalar pipelined microprocessor, and thescheduler 216 schedules the issue of multiple instructions per clock cycle, and in particular, the issue of instructions from multiple threads per clock cycle, commonly referred to as simultaneous multithreading. - The
microprocessor 100 includes aninstruction cache 202 for caching program instructions fetched from a system memory of a system including themicroprocessor 100, such as the MFTR/MTTR 300/400 instructions. In one embodiment, themicroprocessor 100 provides virtual memory capability, and the fetchunit 204 includes a translation lookaside buffer (TLB) for caching virtual to physical memory page translations. In one embodiment, each thread, or program, or task, executing on themicroprocessor 100 is assigned a unique task ID, or address space ID (ASID), which is used to perform memory accesses and in particular memory address translations, and athread context 104 also includes storage for an ASID associated with the thread. In one embodiment, the various threads executing on themicroprocessor 100 share theinstruction cache 202 and TLB, as discussed in more detail below. - The
microprocessor 100 also includes a fetchunit 204, coupled to theinstruction cache 202, for fetching program instructions, such as MFTR/MTTR 300/400 instructions, from theinstruction cache 202 and system memory. The fetchunit 204 fetches instructions at an instruction fetch address provided by amultiplexer 244. Themultiplexer 244 receives a plurality of instruction fetch addresses from the corresponding plurality of program counters 222. Each of the program counters 222 stores a current instruction fetch address for a different program thread. The embodiment ofFIG. 2 illustrates four different program counters 222 associated with four different threads. Themultiplexer 244 selects one of the fourprogram counters 222 based on a selection input provided by thescheduler 216. In one embodiment, the various threads executing on themicroprocessor 100 share the fetchunit 204. - The
microprocessor 100 also includes adecode unit 206, coupled to the fetchunit 204, for decoding program instructions fetched by the fetchunit 204, such as MFTR/MTTR 300/400 instructions. Thedecode unit 206 decodes the opcode, operand, and other fields of the instructions. In one embodiment, the various threads executing on themicroprocessor 100 share thedecode unit 206. - The
microprocessor 100 also includesexecution units 212 for executing instructions. Theexecution units 212 may include but are not limited to one or more integer units for performing integer arithmetic, Boolean operations, shift operations, rotate operations, and the like; floating point units for performing floating point operations; load/store units for performing memory accesses and in particular accesses to adata cache 242 coupled to theexecution units 212; and a branch resolution unit for resolving the outcome and target address of branch instructions. In one embodiment, thedata cache 242 includes a translation lookaside buffer (TLB) for caching virtual to physical memory page translations, which is shared by the various thread contexts, as described in more detail below. In addition to the operands received from thedata cache 242, theexecution units 212 also receive operands from registers of the general purpose register sets 224. In particular, anexecution unit 212 receives operands from aregister set 224 of thethread context 104 allocated to the thread to which the instruction belongs. Amultiplexer 248 selects operands from the appropriate register set 224 for provision to theexecution units 212. In addition, themultiplexer 248 receives data from each of the other per-thread contexts 226 and program counters 222, for selective provision to theexecution units 212 based on thethread context 104 of the instruction being executed by theexecution unit 212. In one embodiment, thevarious execution units 212 may concurrently execute instructions from multiple concurrent threads. - The
microprocessor 100 also includes aninstruction issue unit 208, coupled to thescheduler 216 and coupled between thedecode unit 206 and theexecution units 212, for issuing instructions to theexecution units 212 as instructed by thescheduler 216 and in response to information about the instructions decoded by thedecode unit 206. In particular, theinstruction issue unit 208 insures that instructions are not issued to theexecution units 212 if they have data dependencies on other instructions previously issued to theexecution units 212. In one embodiment, an instruction queue is imposed between thedecode unit 206 and theinstruction issue unit 208 for buffering instructions awaiting issue to theexecution units 212 for reducing the likelihood of starvation of theexecution units 212. In one embodiment, the various threads executing on themicroprocessor 100 share theinstruction issue unit 208. - The
microprocessor 100 also includes a write-backunit 214, coupled to theexecution units 212, for writing back results of instructions into the general purpose register sets 224, program counters 222, andother thread contexts 226. Ademultiplexer 246 receives the instruction result from the write-backunit 214 and stores the instruction result into the appropriate register set 224, program counters 222, andother thread contexts 226 associated with the instruction's thread. The instruction results are also provided for storage into theVPE contexts 106 and theVMP context 108. - Referring now to
FIG. 3 , a block diagram illustrating anMFTR instruction 300 executed by themicroprocessor 100 ofFIG. 1 according to the present invention is shown.FIG. 3 comprisesFIG. 3A illustrating the format and function of theMFTR instruction 300, andFIG. 3B illustrating a table 350 specifying selection of theMFTR instruction 300 source register 324 based on its operand values. The mnemonic for theMFTR instruction 300 is MFTR rt, rd, u, sel, h as shown. As shown inFIG. 3 , theMFTR instruction 300 instructs themicroprocessor 100 to copy the contents of a source register 324 of atarget thread context 104 to adestination register 322 of an issuingthread context 104. - Bits 11-15 are an
rd field 308, which specifies anrd register 322, ordestination register 322, within the general purpose register set 224 ofFIG. 2 of thethread context 104 from which theMFTR instruction 300 is issued, referred to herein as the issuing thread context. In one embodiment, thedestination register 322 is one of 32 general purpose registers of the MIPS ISA. - Bits 16-20, 6-10, 5, 4, and 0-2 are an
rt field 306,rx field 312,u field 314,h field 316, andsel field 318, respectively, which collectively are used to specify a source register 324 of athread context 104 distinct from the issuing thread context, referred to herein as thetarget thread context 104. The use of thert field 306,rx field 312,u field 314,h field 316, andsel field 318 to specify the source register 324 is described in detail in table 350 ofFIG. 3B . - In one embodiment, the
microprocessor 100 includes one or more processor control coprocessors, referred to in the MIPS PRA asCoprocessor 0, or CP0, or Cop0, denoted 602 inFIGS. 6 and 8 , which is generally used to performvarious microprocessor 100 configuration and control functions, such as cache control, exception control, memory management unit control, and particularly multithreading control and configuration. As shown in Table 350,a u field 314 value of 0 selects one of the CP0 registers as theMFTR instruction 300source register 324. Table 500 ofFIG. 5A illustrates the particular rt field 306 (orrd 308 in the case of MTTR 400) andsel field 318 values used to select the various multithreading-related CP0 registers. In one embodiment, as shown in Table 350,a u field 314 value of 1 and asel field 318 value of 0 selects one of the general purpose registers 224 ofFIG. 2 , selected by thert field 306 value, as theMFTR instruction 300source register 324. In one embodiment, themicroprocessor 100 includes a digital signal processor (DSP) arithmetic unit or multiplier for performing common DSP-related arithmetic operations, and eachthread context 104 includes four accumulators for storing the TC-specific results of the arithmetic operations and a DSPControl register of the DSP accumulators, denoted 224 inFIGS. 6 and 8 . Au field 314 value of 1 and asel field 318 value of 1 selects as theMFTR instruction 300 source register 324 one of the DSP accumulator registers or the DSPControl register, selected by thert field 306 value, as shown. In one embodiment, themicroprocessor 100 includes one or more floating point or multimedia coprocessors, referred to in the MIPS PRA asCoprocessor 1, or CP1, or Cop1, denoted 604 inFIGS. 6 and 8 . As shown in Table 350,a u field 314 value of 1 and asel field 318 value of 2 selects as theMFTR instruction 300 source register 324 one of the floating point unit data registers (FPR) selected by thert field 306 value; furthermore, asel field 318 value of 3 selects as theMFTR instruction 300 source register 324 one of the floating point unit control registers (FPCR) selected by thert field 306 value. In one embodiment, themicroprocessor 100 includes one or more implementation-specific coprocessors, referred to in the MIPS PRA asCoprocessor 2, or CP2, or Cop2, denoted 606 inFIGS. 6 and 8 . As shown in Table 350,a u field 314 value of 1 and asel field 318 value of 4 selects as theMFTR instruction 300 source register 324 one of the CP2 data registers (Cop2 Data) selected by the concatenation of therx field 312 value and thert field 306 value; furthermore, asel field 318 value of 5 selects as theMFTR instruction 300 source register 324 one of the CP2 control registers (Cop2 Control) selected by the concatenation of therx field 312 value and thert field 306 value. - The source register 324 is further specified by a
TargTC operand 332. TheTargTC 332 operand specifies thetarget thread context 104 containing thesource register 324. In one embodiment, theTargTC operand 332 is stored in theVPEControl Register 504 ofFIG. 5E . If the source register 324 is a per-VPE 102 register, the source register 324 is of theVPE 102 to which thetarget thread context 104 is bound, as specified by theCurVPE field 558 of theTCBind Register 556 ofFIG. 5K . - Referring now to
FIG. 4 , a block diagram illustrating anMTTR instruction 400 executed by themicroprocessor 100 ofFIG. 1 according to the present invention is shown.FIG. 4 comprisesFIG. 4A illustrating the format and function of theMTTR instruction 400, andFIG. 4B illustrating a table 450 specifying selection of theMTTR instruction 400destination register 422 based on its operand values. The various fields of theMTTR instruction 400 are identical to the fields of theMFTR instruction 300, except that the value of thesub-opcode field 404 is different, and the use of thert field 306 andrd field 308 is reversed, i.e., thert field 306 is used by theMTTR instruction 400 to select the source register 424 and therd field 308 is used—along with therx 3 12,u 314,h 316, andsel 318 fields—to select thedestination register 422 within thethread context 104 specified by theTargTC 332 operand, as shown inFIG. 4 . As shown inFIG. 4 , theMTTR instruction 400 instructs themicroprocessor 100 to copy the contents of a source register 424 of the issuingthread context 104 to adestination register 424 of thetarget thread context 104. - Referring now to
FIG. 5 , a series of block diagrams illustrating various multithreading-related registers of themicroprocessor 100 ofFIG. 1 according to one embodiment of the present invention is shown.FIG. 5 comprisesFIG. 5A-5P . In one embodiment, the registers ofFIG. 5 are comprised inCP0 602 ofFIG. 6 and 8, andFIG. 5A is a table 500 indicating the particular rt field 306 (orrd 308 in the case of MTTR 400) andsel field 318 values used to select the various multithreading-related CP0 registers 602. As indicated in table 500, some of the registers are included in theVMP context 108 ofFIG. 1 (i.e., are per-microprocessor 100 registers), some of the registers are included in theVPE contexts 106 ofFIG. 1 (i.e., are per-VPE 102 registers), and some of the registers are included in thethread contexts 104 ofFIG. 1 (i.e., are per-thread context 104 registers). Most ofFIGS. 5B-5P include an illustration of the fields of each of the multithreading registers and a table describing the various fields. Fields of particular relevance are discussed in more detail herein. Each of the registers illustrated inFIG. 5 of one thread context (i.e., the target thread context 104) may be selectively read and/or written by another thread context 104 (i.e., the issuing thread context 104) that executes anMFTR 300 orMTTR 400 instruction, respectively, depending upon the readability or writeability of the particular register or bits thereof - The
EVP bit 513 ofFIG. 5B controls whether themicroprocessor 100 is executing as a virtual multiprocessor, i.e., ifmultiple VPEs 102 may concurrently fetch and issue instructions from distinct threads of execution. ThePVPE field 524 ofFIG. 5C specifies the total number ofVPEs 102, i.e., the total number ofVPE contexts 106, instantiated in themicroprocessor 100. In the embodiment ofFIG. 5 , up to sixteenVPEs 102 may be instantiated in themicroprocessor 100. ThePTC field 525 ofFIG. 5C specifies the total number ofthread contexts 104 instantiated in themicroprocessor 100. In the embodiment ofFIG. 5 , up to 256thread contexts 104 may be instantiated in themicroprocessor 100. The TE bit 543 ofFIG. 5E controls whether multithreading is enabled or disabled within aVPE 102. In one embodiment, the effect of clearing theEVP bit 513 andTE bit 543 may not be instantaneous; consequently the operating system should execute a hazard barrier instruction to insure that allVPEs 102 andthread contexts 104, respectively, have been quiesced. - As discussed above,
TargTC field 332 ofFIG. 5E is used by an issuingthread context 104 to specify thethread context 104 that contains the source register 324 in the case of anMFTR instruction 300 or thedestination register 422 in the case of anMTTR instruction 400. In one embodiment, the issuingthread context 104 executes an instruction prior to the MFTR/MTTR instruction 300/400 to populate theTargTC 332 field of theVPEControl Register 504. In one embodiment, asingle TargTC 332 value perVPE 102 is sufficient since multithreading must be disabled on theVPE 102 issuing the MFTR/MTTR 300/400 instruction; hence, none of theother thread contexts 104 of theVPE 102 may be using theTargTC 332 field of theVPEControl Register 504 of the issuingVPE 102. In an alternate embodiment, theTargTC 332 value may be provided within a field of the MFTR/MTTR 300/400 instructions. TheTargTC field 332 is used to specify thetarget thread context 104 independent of theVPE 102 to which thetarget thread context 104 is bound. Eachthread context 104 in themicroprocessor 100 has a unique number, or identifier, specified in theCurTC field 557 of theTCBind Register 556 ofFIG. 5K , withvalues 0 through N-1, where N is the number of instantiatedthread contexts 104, and N may be up to 256. If the target register (source register 324 of anMFTR instruction 300, or destination register 422 of an MTTR instruction 400) is a per-TC register, then the target register is in thethread context 104 specified by theTargTC 332 value; if the target register is a per-VPE register, then the target register is in theVPE 102 to which thethread context 104 specified in theTargTC 332 is bound. - The TCU0 . . .
TCU3 bits 581 of theTCStatus Register 508 ofFIG. 5J control and indicate whether thethread context 104 controls access to its VPE's 102Coprocessor TCU3 bits 581 andTKSU bits 589 of theTCStatus Register 508 correspond to the CU0 . . . CU3bits 572 and theKSU bits 574, respectively, of theStatus Register 571 ofFIG. 5M ; and theTASID bits 528 of theTCStatus Register 508 correspond to theASID bits 538 of theCoprocessor 0EntryHi Register 526 ofFIG. 5N described in the MIPS32® Architecture for Programmers Volume III: The MIPS32® Privileged Resource Architecture, Document Number: MD00090, Revision 2.50, Jul. 1, 2005, available from MIPS Technologies, 1225 Charleston Road, Mountain View, Calif. 94043-1353. In particular, each time the bits are written in one of the registers, the corresponding change is reflected by a read of the other register. For example, if a new value is written to theTKSU bits 589, the new value may be read from theKSU bits 574 of theStatus Register 571, and vice versa. For another example, if a new value is written to theASID bits 538 of theEntryHi Register 526, the new value may be read from theTASID bits 528 of theTCStatus Register 508, and vice versa. - The
TCContext Register 595 ofFIG. 5L is a read/write register usable by the operating system as a pointer to a thread context-specific storage area in memory, such as a thread context control block. TheTCContext Register 595 may be used by the operating system, for example, to save and restore state of athread context 104 when the program thread associated with thethread context 104 must be swapped out for use by another program thread. - The
RNST bits 582 of theTCStatus Register 508 indicate the state of thethread context 104, namely whether thethread context 104 is running or blocked, and if blocked the reason for blockage. TheRNST 582 value is only stable when read by anMFTR instruction 300 if thetarget thread context 104 is in a halted state, which is described below; otherwise, theRNST 582 value may change asynchronously and unpredictably. When athread context 104 is in the running state, themicroprocessor 100 will fetch and issue instructions from the thread of execution specified by thethread context 104program counter 222 according to thescheduler 216 scheduling policy. - Independently of whether a
thread context 104 is free or activated, athread context 104 may be halted if theH bit 599 of theTCHalt Register 509 ofFIG. 5K is set. That is, afirst thread context 104 running an operating system thread may halt asecond thread context 104 by writing a 1 to theH bit 599 of theTCHalt Register 509 of thesecond thread context 104. Afree thread context 104 has no valid content and themicroprocessor 100 does not schedule instructions of afree thread context 104 to be fetched or issued. Themicroprocessor 100 schedules instructions of an activatedthread context 104 to be fetched and issued from the activatedthread context 104program counter 222. Themicroprocessor 100 schedules only activatedthread contexts 104. Themicroprocessor 100 allows the operating system to allocate onlyfree thread contexts 104 to create new threads. Setting theH bit 599 of an activatedthread context 104 causes thethread context 104 to cease fetching instructions and to load itsrestart address 549 into the TCRestart register 594 ofFIG. 5K with the address of the next instruction to be issued for thethread context 104. Only athread context 104 in a halted state is guaranteed to be stable as seen byother thread contexts 104, i.e., when examined by anMFTR instruction 300. Multithreaded execution may be temporarily inhibited on aVPE 102 due to exceptions or explicit software interventions, but activatedthread contexts 104 that are inhibited in such cases are considered to be suspended, rather than implicitly halted. A suspendedthread context 104 is inhibited from any action which might cause exceptions or otherwise changeglobal VPE 102 privileged resource state, but unlike a halted thread, a suspendedthread context 104 may still have instructions active in the pipeline; consequently, the suspendedthread context 104, including general purpose registers 224 values, may still be unstable; therefore, thethread context 104 should not be examined by anMFTR instruction 300 until thethread context 104 is halted. In one embodiment, the effect of clearing theH bit 599 may not be instantaneous; consequently the operating system should execute a hazard barrier instruction to insure that the target thread context has been quiesced. - When a
thread context 104 is in a halted state, theTCRestart Register 594 may be read to obtain theaddress 549 of the instruction at which themicroprocessor 100 will resume execution of thethread context 104 when thethread context 104 is restarted. In the case of branch and jump instructions with architectural branch delay slots, therestart address 549 will advance beyond the address of the branch or jump instruction only after the instruction in the delay slot has been retired. If thethread context 104 is halted between the execution of a branch instruction and the associated delay slot instruction, the branch delay slot is indicated by theTDS bit 584 of theTCStatus Register 508. - Conversely, the TCRestart register 594 can be written while its
thread context 104 is halted to change the address at which thethread context 104 will restart. Furthermore, afirst thread context 104 running an operating system thread may restart asecond thread context 104 by writing a 0 to theH bit 599 of theTCHalt Register 509 of thesecond thread context 104. Clearing theH bit 599 of an activatedthread context 104 allows thethread context 104 to be scheduled, and begin fetching and executing instructions at itsrestart address 549 specified in itsTCRestart register 594. - In the MIPS PRA, the
Coprocessor 0EPC Register 598 ofFIG. 5L contains the address at which the exceptionservicing thread context 104 will resume execution after an exception has been serviced and thethread context 104 executes an ERET (exception return) instruction. That is, when the thread running on thethread context 104 executes an ERET instruction, theVPE 102 reads theEPC Register 598 to determine the address at which to begin fetching and issuing instructions. Unless theEXL bit 576 of theStatus Register 571 ofFIG. 5M is already set, themicroprocessor 100 writes theEPC Register 598 when an exception is raised. For synchronous exceptions, themicroprocessor 100 writes the address of the instruction that was the direct cause of the exception, or the address of the immediately preceding branch or jump instruction, if the exception-causing instruction is in a branch delay slot. For asynchronous exceptions, themicroprocessor 100 writes the address of the instruction at which execution will be resumed. - In a MIPS
MT ASE microprocessor 100, theEPC Register 598 is instantiated for eachVPE 102 in themicroprocessor 100. When an exception is raised to aVPE 102, theVPE 102 selects one of itsthread contexts 104 to service the exception. Allthread contexts 104 of theVPE 102, other than thethread context 104 selected to service the exception, are stopped and suspended until theEXL bit 576 and ERL bit 575 of theStatus Register 571 are cleared. When a synchronous exception is raised due to the execution of an instruction contained in a thread of execution, themicroprocessor 100 selects thethread context 104 running the thread containing the offending instruction to service the exception. That is, the general purpose registers 224,program counter 222, and other per-thread context 226 of the offendingthread context 104 are used to service the synchronous exception. When an asynchronous exception is raised, such as an interrupt, themicroprocessor 100 selects one of theeligible thread contexts 104 bound to theVPE 102 to service the asynchronous exception. TheVPE 102 to which athread context 104 is bound (as indicated by theCurVPE field 558 of the TCBind Register 556) is the exception domain for thethread context 104. In particular, aVPE 102 selects athread context 104 bound to it, i.e., within its exception domain, to service an exception. Additionally, athread context 104 utilizes the resources related to handling exceptions (such as theCoprocessor 0EPC Register 598 and Status Register 571) of the exception domain, orVPE 102, to which thethread context 104 is bound when servicing an exception. The method for choosing theeligible thread context 104 to service an asynchronous exception is implementation-dependent and may be adapted to satisfy the particular application in which themicroprocessor 100 is employed. However, as discussed herein, the MIPS MT ASE does not provide the capability for the asynchronous exception to specify which of thethread contexts 104 must service the asynchronous exception. Themicroprocessor 100 saves the restart address of thethread context 104 selected to service the exception in theEPC Register 598 of theVPE 102 to which the selectedthread context 104 is bound. Additionally, athread context 104 may be made ineligible for being selected to service an asynchronous exception by setting theIXMT bit 518 in itsTCStatus Register 508. - In one embodiment, the
program counter 222 ofFIG. 2 is not an architecturally-visible register, but is affected indirectly by various events and instructions. Effectively, theprogram counter 222 is a virtual program counter represented by various storage elements within themicroprocessor 100 pipeline, and the meaning or value of theprogram counter 222 depends upon the context in which it is examined or updated. For example, as athread context 104 fetches instructions from theinstruction cache 202, theprogram counter 222 value is the address at which the instructions are being fetched. Thus, in this context the storage element storing the current fetch address may be viewed as theprogram counter 222. For another example, when an exception is taken and theVPE 102 selects athread context 104 to service the exception, the address written by theVPE 102 to theEPC Register 598 may be viewed as theprogram counter 222 value of the selectedthread context 104 in this situation since when the selectedthread context 104 executes an ERET instruction, fetching for thethread context 104 begins at theEPC Register 598 value. For another example, the TCRestart register 594 of athread context 104 may be viewed as theprogram counter 222 when athread context 104 is halted since when thethread context 104 is unhalted, fetching for thethread context 104 begins at the TCRestart register 594 value. - The
Coprocessor 0Status Register 571 ofFIG. 5M is instantiated for eachVPE 102 in themicroprocessor 100. Only certain fields of theStatus Register 571 are described herein. For a more detailed description of the other bits in theStatus Register 571, the reader is referred to the document MIPS32® Architecture for Programmers Volume III: The MIPS32® Privileged Resource Architecture, Document Number: MD00090, Revision 2.50, Jul. 1, 2005, which is hereby incorporated by reference in its entirety for all purposes. As discussed above, the CU0 . . . CU3bits 572 and theKSU bits 574 correspond to the TCU0 . . .TCU3 bits 581 andTKSU bits 589, respectively, of theTCStatus Register 508 ofFIG. 5J . TheERL bit 575 is set by themicroprocessor 100 hardware whenever a Reset, Soft Reset, NMI, or Cache Error exception is taken. TheEXL bit 576 is set by themicroprocessor 100 hardware whenever any other exception is taken. WhenERL 575 orEXL 576 is set, theVPE 102 is running in kernel mode with interrupts disabled. When theIE bit 577 is set, all interrupts for theVPE 102 are disabled. - Referring now to
FIG. 6 , a block diagram illustrating data paths of themicroprocessor 100 for performing theMFTR instruction 300 according to the present invention is shown. Themicroprocessor 100 includesselection logic 636 that receives the contents of each of the registers ofCoprocessor 0 602,Coprocessor 1 604,Coprocessor 2 606, and the general purpose and DSP accumulator registers 224 ofFIG. 2 and selects the source register 324 contents, which is one of the register contents from thetarget thread context 104, for provision to deselectionlogic 638 based on values of thert 306 operand, therx 312 operand, theu 314 operand, theh 316 operand, and thesel 318 operand of theMFTR instruction 300, as well as theTargTC 332 operand. Thedeselection logic 638 receives the source register 324 contents selected by theselection logic 636 and writes the selected contents into thedestination register 322, which is one of the general purpose registers 224 of the issuingthread context 104, based on the value of therd 308 operand of theMFTR instruction 300, as well assignals VPE 102 and issuingthread context 104, respectively. - Referring now to
FIG. 7 , a block diagram illustrating data paths of themicroprocessor 100 for performing theMTTR instruction 400 according to the present invention is shown. Themicroprocessor 100 includesselection logic 738 that receives the contents of each of the general purpose registers 224 of the issuingthread context 104 and selects the source register 424, which is one of the register contents from the issuingthread context 104, for provision to deselectionlogic 736 based on the value of thert 306 operand of theMTTR instruction 400, as well assignals VPE 102 and issuingthread context 104, respectively. Thedeselection logic 736 receives the source register 424 contents selected by theselection logic 738 and writes the selected contents into thedestination register 422, which is one of the registers ofCoprocessor 0 602,Coprocessor 1 604,Coprocessor 2 606, or the general purpose and DSP accumulator registers 224 ofFIG. 2 , based on values of therd 308 operand, therx 312 operand, theu 314 operand, theh 316 operand, and thesel 318 operand of theMTTR instruction 400, as well as theTargTC 332 operand. In one embodiment, the selection and de-selection logic ofFIGS. 6 and 7 may comprise a hierarchy of multiplexers, demultiplexers, data buses, and control logic for generating a plurality of bank and register selectors to control the multiplexers and demultiplexers for selecting the appropriate values from the specified register for provision on the data buses. In one embodiment, the data paths may also include intermediate registers for storing the values transferred between the issuing and target thread contexts over multiple clock cycles. - Referring now to
FIG. 8 , a flowchart illustrating operation of themicroprocessor 100 to execute theMFTR instruction 300 according to the present invention is shown. Flow begins atblock 802. - At
block 802, theinstruction issuer 208 ofFIG. 2 issues anMFTR instruction 300 to theexecution units 212. Flow proceeds todecision block 803. - At
decision block 803, theexecution unit 212 examines theTKSU bits 589 of theTCStatus Register 508 to determine whether the privilege level of the issuingthread context 104 is at kernel privilege level. If so, flow proceeds to decision block 804; otherwise, flow proceeds to block 805. - At
block 805, theexecution unit 212 raises an exception to theMFTR instruction 300 since the issuingthread context 104 does not have sufficient privilege level to execute theMFTR instruction 300. Flow ends atblock 805. - At
decision block 804, theexecution unit 212 determines whether thetarget thread context 104 is halted by examining the value of theH bit 599 of theTCHalt Register 509 ofFIG. 5K . If thetarget thread context 104 is halted, flow proceeds to decision block 808; otherwise flow proceeds to block 816. - At
decision block 808, theexecution unit 212 examines theTargTC 332 value of the issuingVPE 102VPEControl Register 504 to determine whether theTargTC 332 value is valid. In one embodiment, theTargTC 332 value is not valid if the issuing VPE is not themaster VPE 102, as indicated by a clear value in theMVP bit 553 of theVPEConf0 Register 505 ofFIG. 5F . In one embodiment, theTargTC 332 value is not valid if thethread context 104 specified byTargTC 332 is not instantiated. If theTargTC 332 value is valid, flow proceeds to decision block 812; otherwise, flow proceeds to block 816. - At
decision block 812, theexecution unit 212 examines theTCU bits 581 in theTCStatus Register 508 ofFIG. 5J to determine whether theMFTR instruction 300 references a coprocessor, and if so, whether the coprocessor is bound to and accessible by thetarget thread context 104 specified by theTargTC 332 value. If theMFTR instruction 300 references a coprocessor, and the coprocessor is not bound to and accessible by thetarget thread context 104 specified by theTargTC 332 value, flow proceeds to block 816; otherwise, flow proceeds todecision block 814. - At
decision block 814, theexecution unit 212 determines whether the source register 324 specified by theMFTR instruction 300 is instantiated. If so, flow proceeds to block 824; otherwise, flow proceeds to block 816. - At
block 816, the results of theMFTR instruction 300 are invalid. That is, themicroprocessor 100 attempts to performblock 824; however, the source, destination, and values of the data transfer are invalid. Flow ends atblock 816. - At
block 824, theexecution unit 212 copies the contents of the source register 324 of thetarget thread context 104 to thedestination register 322 of the issuingthread context 104. In one embodiment, themicroprocessor 100, after reading the source register 324, updates the source register 324 with an update value. In one embodiment, the read/update is performed atomically. In one embodiment, the update value is provided in theGPR 224 specified by therd field 308 in theMFTR instruction 300. Flow ends atblock 824. - Referring now to
FIG. 9 , a flowchart illustrating operation of themicroprocessor 100 to execute theMTTR instruction 400 according to the present invention is shown. Flow begins ablock 902. - At
block 902, theinstruction issuer 208 ofFIG. 2 issues anMTTR instruction 400 to theexecution units 212. Flow proceeds todecision block 903. - At
decision block 903, theexecution unit 212 examines theTKSU bits 589 of theTCStatus Register 508 to determine whether the privilege level of the issuingthread context 104 is at kernel privilege level. If so, flow proceeds to decision block 904; otherwise, flow proceeds to block 905. - At
block 905, theexecution unit 212 raises an exception to theMTTR instruction 400 since the issuingthread context 104 does not have sufficient privilege level to execute theMTTR instruction 400. Flow ends atblock 905. - At
decision block 904, theexecution unit 212 determines whether thetarget thread context 104 is halted by examining the value of theH bit 599 of theTCHalt Register 509 ofFIG. 5K . If thetarget thread context 104 is halted, flow proceeds to decision block 908; otherwise flow proceeds to block 916. - At
decision block 908, theexecution unit 212 examines theTargTC 332 value of the issuingVPE 102VPEControl Register 504 to determine whether theTargTC 332 value is valid. In one embodiment, theTargTC 332 value is not valid if the issuing VPE is not themaster VPE 102, as indicated by a clear value in theMVP bit 553 of theVPEConf0 Register 505 ofFIG. 5F . In one embodiment, theTargTC 332 value is not valid if thethread context 104 specified byTargTC 332 is not instantiated. If theTargTC 332 value is valid, flow proceeds to decision block 912; otherwise, flow proceeds to block 916. - At
decision block 912, theexecution unit 212 examines theTCU bits 581 in theTCStatus Register 508 ofFIG. 5J to determine whether theMTTR instruction 400 references a coprocessor, and if so, whether the coprocessor is bound to and accessible by thetarget thread context 104 specified by theTargTC 332 value. If theMTTR instruction 400 references a coprocessor, and the coprocessor is not bound to and accessible by thetarget thread context 104 specified by theTargTC 332 value, flow proceeds to block 916; otherwise, flow proceeds todecision block 914. - At
decision block 914, theexecution unit 212 determines whether thedestination register 422 specified by theMTTR instruction 400 is instantiated. If so, flow proceeds to block 924; otherwise, flow proceeds to block 916. - At
block 916, themicroprocessor 100 performs no operation because there is no valid destination register to which the source data may be written. Flow ends atblock 916. - At
block 924, theexecution unit 212 copies the contents of the source register 424 of the issuingthread context 104 to thedestination register 422 of thetarget thread context 104. Flow ends atblock 924. - Referring now to
FIG. 10 , a flowchart illustrating a method for performing an inter-processor interrupt (IPI) from onethread context 104 to anotherthread context 104 within aVPE 102 of themicroprocessor 100 ofFIG. 1 according to the present invention is shown. The steps of the flowchart substantially correlate to the source code listing included in the computer program listing appendix, and reference is made within the description ofFIG. 10 to the source code listing. The source code listing is for a version of the Linux SMP operating system modified to view eachthread context 104 of themicroprocessor 100 as a separate processor, or CPU, which is referred to herein as symmetric multi-thread context (SMTC) Linux. The source code listing includes two C language functions (smtc_send_ipi and post_direct_ipi), one assembly language routine (smtc_ipi_vector), and one assembly language macro (CLI). - Within the flowchart, reference is made to a thread A running on a
thread context A 104 and a thread B running on athread context B 104. Thread A running onthread context A 104 directs a software-emulated inter-processor interrupt (IPI) tothread context B 104, by employingMFTR instructions 300 andMTTR instructions 400. In the example of the flowchart,thread context A 104 andthread context B 104 are bound to thesame VPE 102. Although the flowchart ofFIG. 10 illustrates only an intra-VPE IPI, the source code listing also includes instructions at lines 23-28 for directing a cross-VPE IPI, or inter-VPE IPI. Afirst thread context 104 is said to direct an inter-VPE IPI to asecond thread context 104 if thesecond thread context 104 is bound to adifferent VPE 102 than thefirst thread context 104. The code performs an inter-VPE IPI by placing an IPI message on a queue associated with thetarget thread context 104. The message specifies thetarget thread context 104. In the embodiment described in the source code at lines 23-28, the message specified thetarget thread context 104 implicitly by being on the queue associated with thetarget thread context 104. The operating system samples the queue and drains it each time the operating system performs a context switch and returns from exception. After queuing the message, the code issues a MIPS PRA asynchronous software interrupt to the target VPE 102 (i.e., to theVPE 102 to which thetarget thread context 104 is bound) by executing an MTTR instruction 400 (within the write_vpe_c0_cause routine) to set one of the software interrupt bits in the MIPSPRA Cause Register 536 ofFIG. 5P of thetarget VPE 102, which will cause the queue to be sampled and drained. If thethread context 104 selected by thetarget VPE 102 to service the software interrupt is the target of the IPI, then the selectedthread context 104 will service the IPI directly; otherwise, the selectedthread context 104 will direct an intra-VPE IPI to thetarget thread context 104 in a manner similar to the operation described in the flowchart ofFIG. 10 . - As described above, when an asynchronous hardware interrupt (such as a periodic timer interrupt used for operating system task scheduling purposes) is requested in a MIPS MT ASE processor, the
VPE 102 that received the hardware interrupt request selects an eligible thread context (in this example, thread context A 104) to handle the exception. In the MIPS architecture, when a hardware interrupt request is made, control is transferred to a general exception vector of the operating system. The general exception vector decodes the cause of the exception and invokes the appropriate interrupt request handler (in this example, thread A), such as the timer handler. - The Linux SMP kernel for the MIPS architecture assumes that every processor, or CPU, in the SMP system will get a periodic interrupt, and divides the work performed by the timer interrupt handler into a local clock interrupt function that executes on all CPUs, and a system clock interrupt function that executes only on one CPU of the SMP system. In the MIPS processor architecture, each
VPE 102 includes one timer inCoprocessor 0 shared by allthread contexts 104 bound to the VPE 102 (see the Count/Compare register pairs described in MIPS32® Architecture for Programmers Volume III: The MIPS32® Privileged Resource Architecture, Document Number: MD00090, Revision 2.50, Jul. 1, 2005). In one embodiment of SMTC Linux, only one of the timers of one of theVPEs 102 is invoked as the single timer for all CPUs of the SMP system. In another embodiment, the timer of each of theVPEs 102 is invoked for all CPUs of thatVPE 102. Thethread context 104 selected to service the asynchronous timer interrupt executes the system clock interrupt function and then broadcasts, or directs, an IPI to all theother thread contexts 104 of theVPE 102. The directed IPI is a local clock interrupt type IPI which instructs the receivingthread contexts 104 to execute only the local clock interrupt function. Although the SMTC Linux timer interrupt handler directs an IPI message to eachthread context 104 known to the operating system as a processor, the flowchart ofFIG. 10 only illustrates directing an IPI to onethread context 104, which isthread context B 104 in this example. The operation of themicroprocessor 100 in response to a timer interrupt to perform preemptive task scheduling is described in more detail inFIG. 11 . Flow begins atblock 1002. - At
block 1002, at source code line 38, thread A running onthread context A 104 halts thread B running onthread context B 104 by executing anMTTR instruction 400 instruction to clear theH bit 599 of theTCHalt Register 509 ofFIG. 5K . It is noted that the C language function write_tc_c0_tchalt includes theMTTR instruction 400. The function settc at line 36 populates theTargTC field 332 of theVPEControl Register 504 ofFIG. 5E with thethread context 104 identifier of the specified thread context 104 (in the example, thread context B 104) for the benefit of theMTTR instruction 400 of the write_tc_c0_tchalt function. Flow proceeds to block 1004. - At
block 1004, at lines 95-100 (via the call the post_direct_ipi at line 64), thread A creates a new stack frame on the kernel stack ofthread context B 104. In one embodiment, the new stack frame is effectively created by the assignment of a value to the kernel stack pointer ofthread context B 104, and storing values on the new stack frame comprises storing values at predetermined offsets from the kernel stack pointer value. It is also noted that if thetarget thread context 104 is exempted from taking interrupts (as indicated by aset IXMT bit 518 ofFIG. 5J ), the code cannot spin waiting for thetarget thread context 104 to become non-exempted from taking interrupts because this may lead to a deadlock condition. Therefore, the code places the IPI message on the target thread context's 104 queue at lines 48-62, in a manner similar to the inter-VPE IPI issued atline 24; however, in this case no inter-VPE 102 software interrupt is necessary. Flow proceeds to block 1006. - At
block 1006, at line 82, thread A reads theTCStatus Register 508 ofthread context B 104 via the function read_tc_c0_tcstatus, which includes anMFTR instruction 300. TheTCStatus Register 508 includes thethread context B 104 execution privilege level and interrupt exemption status, among other things. Thread A, atline 104, also saves theTCStatus Register 508 value to the stack frame created atblock 1004. Flow proceeds to block 1008. - At
block 1008, at line 83, thread A reads therestart address 549 of thread B from TCRestart register 594 ofthread context B 104 via the function read_tc_c0_tcrestart, which includes anMFTR instruction 300. Thread A, atline 102, also saves therestart address 549 to the stack frame created atblock 1004. Flow proceeds to block 1012. - At
block 1012, atlines block 1004. In the embodiment of the source code listing, advantageously, the code manipulates the targetthread context B 104 and stack frame such that a common IPI handler may be invoked to support SMTC operation. The common IPI handler is invoked to handle both software emulated interrupts described herein and actual hardware interrupts, i.e., interrupts for which target thread context 104 B is thethread context 104 selected by theVPE 102 to handle the hardware interrupt request, such as may be invoked atblock 1114 ofFIG. 11 . Flow proceeds to block 1014. - At
block 1014, at lines 110-112, thread A writes theTCStatus Register 508 ofthread context B 104 via the function the function write_tc_c0_tcstatus, which includes anMTTR instruction 400, to set the execution privilege level ofthread context B 104 to kernel mode and disables, or exempts,thread context B 104 from receiving interrupts. Conceptually, thread A would set theEXL bit 576 inCoprocessor 0Status Register 571 in order to emulate an exception. However, whenEXL 576 is set, multithreading is disables on theVPE 102, i.e., only onethread context 104 is allowed to run whenEXL 576 is set. And thread A needsthread context B 104 to run when un-halted below atblock 1018. Therefore, the setting ofEXL 576 must be left up tothread context B 104 by smtc_ipi_vector atblock 1022 below. Thus, until then, thread A temporarily accomplishes a similar effect to settingEXL 576 by settingIXMT 518 andTKSU 589 to kernel mode in thethread context B 104TCStatus Register 508. Flow proceeds to block 1016. - At
block 1016, at line 115, thread A writes therestart address 549 of thread B in the TCRestart register 594 ofthread context B 104 via the function the function write_tc_c0_tcrestart, which includes anMTTR instruction 400, with the address of smtc_ipi_vector. Flow proceeds to block 1018. - At
block 1018, at line 65, thread A un-halts, or restarts,thread context B 104 to cause smtc_ipi_vector to begin running onthread context B 104. Flow proceeds to block 1022. - At
block 1022, at lines 163-165, the smtc_ipi_vector setsEXL 576, which has the effect of disabling interrupts and setting the execution privilege level to kernel mode for allthread contexts 104 bound to theVPE 102. It is noted that at line 160 the smtc_ipi_vector disables multithreading on theVPE 102 before settingEXL 576. Additionally, if multithreading was enabled prior to line 160, the code restores multithreading at lines 168-170. It is also noted that ifthread context B 104 was in user mode when halted atblock 1002, the smtc_ipi_vector sets theCU0 bit 572 of theStatus Register 571. Flow proceeds to block 1024. - At
block 1024, at lines 196 and 198, the smtc_ipi_vector restores thethread context B 104pre-halted TCStatus Register 508 value that was saved atblock 1006, and in particular restores its execution privilege level and interrupt exemption state. Flow proceeds to block 1026. - At
block 1026, at lines 200-201, the smtc_ipi_vector loads theEPC Register 598 with thethread context B 104 pre-haltedTCRestart register 594 value saved atblock 1008. Consequently, when the standard Linux SMP return from interrupt code subsequently executes an ERET instruction atblock 1036, thread B will be restarted onthread context B 104 at the address at which it was halted atblock 1002. Thus, by settingEXL 576 atblock 1022 and populating theEPC Register 598 atblock 1026, the smtc_ipi_vector effectively emulates what themicroprocessor 100 hardware would do ifthread context B 104 had been selected to service the asynchronous interrupt (rather than thread context A 104). Flow proceeds to block 1028. - At
block 1028, at line 203, the smtc_ipi_vector saves all of the general purpose registers 224 to the stack frame created atblock 1004. Flow proceeds to block 1032. - At
block 1032, atline 204 via the CLI macro, the smtc_ipi_vector sets itself to kernel mode execution privilege level and exempts itself from servicing interrupts. It is noted that this is performed only forthread context B 104, not for theentire VPE 102. It is noted that the CLI macro is a standard Linux macro which is modified to support SMTC by setting kernel mode execution privilege level and exempting from interrupt servicing (via the IXMT bit 518) only the invokingthread context 104, rather than the entire VPE 102 (as the non-SMTC code does by clearing theIE bit 577 of theStatus Register 571 ofFIG. 5M ), as shown at lines 227-247. Flow proceeds to block 1034. - At
block 1034, at lines 205-210, the smtc_ipi_vector calls the common IPI handler (which is ipi decode, as populated at line 108) with the IPI message reference saved on the stack frame atblock 1012 as an argument. Flow proceeds to block 1036. - At
block 1036, atline 212, after the operating system IPI handler returns, the smtc_ipi_vector jumps to the standard operating system return from interrupt code (which in Linux SMP is ret_from_irq), which eventually executes an ERET instruction to return execution onthread context B 104 to thread B with its pre-halted execution privilege level and interrupt exemption state. Prior to executing the ERET instruction, the return from interrupt code restores theEPC Register 598 with the restart address value saved atblock 1008 and restores theStatus Register 571KSU bits 574 with the value saved atblock 1006. Flow ends atblock 1036. - Referring now to
FIG. 1 1, a flowchart illustrating a method for performing preemptive process scheduling by a symmetric multiprocessor operating system (SMP OS), such as Linux SMP, on themicroprocessor 100 ofFIG. 1 according to the present invention is shown. Symmetric multiprocessor operating systems manage a plurality of processes, or tasks, and assign the execution of the processes to particular processors, or CPUs, of the symmetric multiprocessor system, which arethread contexts 104 in the case ofmicroprocessor 100. Within the set of processes assigned to execute on a given CPU, orthread context 104, the preemptive SMP OS schedules the set of processes to run on the assignedthread context 104 in some time-multiplexed fashion according to the scheduling algorithm of the SMP OS. Flow begins atblock 1102. - At
block 1102, a timer generates an interrupt request to aVPE 102, which are the exception domains of themicroprocessor 100. In one embodiment, the timer interrupt request is an asynchronous hardware interrupt generated by the MIPS PRA Count/Compare register pairs of one of theVPEs 102 ofmicroprocessor 100, and the Count/Compare register pairs of theother VPEs 102 are all disabled. Flow proceeds to block 1104. - At
block 1104, the interruptedVPE 102 selects aneligible thread context 104 bound to itself to service the timer interrupt request. As described above, in the MIPS MT ASE, athread context 104 is eligible if itsIXMT bit 518 is clear and thecurVPE field 558 of theTCBind Register 556 ofFIG. 5K specifies to whichVPE 102 thethread context 104 is bound. In one embodiment, the method for choosing theeligible thread context 104 to service an asynchronous exception is implementation-dependent and may be adapted to satisfy the particular application in which themicroprocessor 100 is employed. For example, theVPE 102 may select aneligible thread context 104 in a random fashion. For another example, theVPE 102 may select aneligible thread context 104 in a round-robin order. For another example, theVPE 102 may select athread context 104 based on the relative priorities of thethread contexts 104, such as selecting thethread context 104 having the lowest relative instruction issue priority, or a lowest relative priority for servicing exceptions. Flow proceeds to block 1106. - At
block 1106, theVPE 102 suspends execution of the threads executing on allthread contexts 104 bound to theVPE 102 except for thethread context 104 selected atblock 1104. In particular, theVPE 102 ceases to issue instructions to the execution pipeline of the threads. Flow proceeds to block 1108. - At
block 1108, theVPE 102 saves the restart address of the selectedthread context 104 into theEPC Register 598, sets theEXL bit 576 of theStatus Register 571, and populates the MIPS PRA Cause register 536, all of the VPE's 102Coprocessor 0VPE context 106. Flow proceeds to block 1112. - At
block 1112, theVPE 102 causes the selectedthread context 104 to execute a general exception handler at the general exception vector according to the MIPS PRA. The general exception handler decodes the cause of the exception via the MIPS PRA Cause register 536 andStatus Register 571 and determines the exception was an asynchronous hardware interrupt generated by the timer. Consequently, the general exception handler calls the timer interrupt service routine, which among other functions, schedules processes according to the preemptive multitasking algorithm of the operating system. In one embodiment, the timer interrupt routine may call a separate routine dedicated to scheduling processes. Flow proceeds to block 1114. - At
block 1114, the timer interrupt service routine determines whether a new process, or task, should be scheduled on the selectedthread context 104 according to the SMP OS multitasking scheduling algorithm. If so, the timer interrupt service routine schedules a new process to run on the selectedthread context 104; otherwise, the timer interrupt service routine leaves the current process to run on the selectedthread context 104. It is noted that a thread and a process herein are not necessarily synonymous. A process is an entity managed by the SMP operating system, and typically comprises entire programs, such as application programs or portions of the operating system itself; whereas a thread is simply a stream of instructions, which of course may be a stream of instructions of an operating system process, or task. Flow proceeds to block 1116. - At
block 1116, the timer interrupt service routine issues a software-emulated inter-processor interrupt to eachother thread context 104 in themicroprocessor 100, according toFIG. 10 and/or the source code listing. In particular, if thetarget thread context 104 is bound to thesame VPE 102 as the selectedthread context 104 and thetarget thread context 104 is not exempted from servicing exceptions (as determined by the IXMT bit 518), then the timer interrupt service routine performs a software-emulated inter-processor interrupt to thetarget thread context 104 according toFIG. 10 ; if thetarget thread context 104 is bound to thesame VPE 102 as the selectedthread context 104 but thetarget thread context 104 is exempted from servicing exceptions, then the timer interrupt service routine places the timer interrupt service IPI message on the target thread context's 104 queue at lines 48-62 of the source code; and if thetarget thread context 104 is bound to adifferent VPE 102 as the selectedthread context 104, then the timer interrupt service routine will place an IPI message on a queue associated with thetarget thread context 104 and issue a MIPS PRA asynchronous software interrupt to thetarget VPE 102, i.e., to theVPE 102 to which thetarget thread context 104 is bound, according to lines 23-28 of the source code, which will cause the queue to be sampled and drained. - At
block 1118, the timer interrupt service routine calls the operating system return from interrupt code, which executes an ERET instruction. If a new process was scheduled to run at block 114, then the ERET causes the newly scheduled process to run; otherwise, the ERET causes the process that was interrupted by the timer interrupt request to continue running. Flow proceeds to block 1122. - At
block 1122, eachthread context 104 that was the target of a software-emulated inter-processor interrupt performed atblock 1116 eventually calls the inter-processor interrupt service routine, according to block 1034 ofFIG. 10 , after performing the other steps ofFIG. 10 . On eachthread context 104, the inter-processor interrupt service routine calls the timer interrupt service routine, which schedules a new process to run on thethread context 104, if appropriate, similar to the manner described above with respect to block 1114. When the inter-processor interrupt handler completes, the operating system return from interrupt code is called, which executes an ERET instruction, according to block 1036 ofFIG. 10 . If the timer interrupt service routine scheduled a new process to run on thethread context 104, then the newly scheduled process will run on thethread context 104 when the return from interrupt code executes the ERET atblock 1036 ofFIG. 10 , rather than thread B, i.e., rather than the thread that was halted by the software-emulated directed inter-processor interrupt. If so, thread B will eventually be scheduled to run again so that it may complete. If the timer interrupt service routine did not schedule a new process to run on thethread context 104, then thread B will continue running when the ERET is executed. Flow ends atblock 1122. - As may be observed from
FIG. 1 1, the software emulation of directed exceptions described according toFIG. 10 enables the SMP OS to treat each thread context as an operating system level CPU, in particular with regard to preemptive process scheduling. - Referring now to
FIG. 12 , a block diagram illustrating a priorart multiprocessor system 1200 is shown. Themultiprocessor system 1200 comprises a plurality of CPUs, denotedCPU 0 throughCPU 3. Each of the CPUs is a conventional MIPS Architecture processor, i.e., without the benefit of the MIPS MT ASE. Each of the CPUs includes aMIPS PRA Coprocessor 0Status register 571,Context Register 527,Cause Register 536, andEntryHi Register 526, substantially similar to those shown inFIGS. 5M, 5N , 5P, and 5N, respectively, and as described in the MIPS32® Architecture for Programmers Volume III: The MIPS32® Privileged Resource Architecture, Document Number: MD00090, Revision 2.50, Jul. 1, 2005. In addition, each of the CPUs comprises its own translation lookaside buffer (TLB) 1202 and floating point unit (FPU) 1206. TheFPU 1206, commonly referred to asCoprocessor 1 in the MIPS PRA, is a processing unit specifically designed for expeditiously executing floating point instructions in hardware rather than emulating execution of the floating point instruction in software. TheTLB 1202 is a relatively small cache memory used to cache recently used virtual to physical address translations. TheTLB 1302 is part of a memory management unit (MMU) of each CPU that enables the CPU to provide virtual memory functionality to programs executing thereon. The MIPS32® Architecture for Programmers Volume III: The MIPS32® Privileged Resource Architecture, Document Number: MD00090, Revision 2.50, Jul. 1, 2005 describes in more detail the organization and operation of theTLB 1202 and MMU. As described in the MIPS PRA document, theTLB 1202 andCoprocessor 0 Registers (including the interrupt control registers) are privileged resources, as are the sharedTLB 1302 and sharedCoprocessor 0 Registers of each VPE 102 (including the interrupt control registers) ofFIG. 13 . In one aspect, the MIPS ISA includes privileged instructions (e.g., tlbr, tlbwr, tlbwi, tlbp, mfc0, mtc0) for accessing theTLB 1202/1302 andCoprocessor 0 Registers (including the interrupt control registers) that may not be executed by user privilege level threads, but may only be accessed by threads with kernel privilege level; otherwise, an exception is generated. Finally, as shown inFIG. 12 , the operating system, such as SMP Linux, maintains anASID cache 1204 for each CPU. - An ASID is an address space identifier, which identifies a unique memory map. A memory map comprises a mapping, or association, or binding, between a virtual address space and a set of physical page addresses. Most commonly, the operating system creates a memory map when it creates a new process, or task. Each process created by the operating system has a memory map. Additionally, the operating system has its own memory map. Multiple processes may share a memory map. Consequently, two CPUs using a shared memory map will result in the same virtual address accessing the same physical memory, or generating identical page fault exceptions. An example in a UNIX-like operating system of two processes sharing a memory map is when a process makes a fork( ) system call (not to be confused with the MIPS MT ASE FORK instruction). In this case, a new process is created which shares the same memory map as its parent process until such time as one of the processes performs a store to memory which would change the contents of the memory. Additionally, and perhaps more commonly, a multithreaded process may have multiple threads running in the same address space using the same memory map. Still further, multiple processes may specifically designate particular memory pages that they share.
- In some embodiments, a memory map comprises a simple contiguous array of page table entries, with each entry specifying a virtual to physical page address translation and other relevant page attribute information. However, because a linear page table may require a significant amount of contiguous memory per process (such as in an embedded application with relatively small pages such as 4 KB pages with a relatively large address space), other memory map schemes may be employed. For example, a multi-level page/segment table structure may be employed in which a memory map is described by a segment table which in turn points to a set of page table entries, some of which (in particular, those which correspond to unpopulated parts of the address space) may be common to multiple memory maps.
- The
ASID cache 1204 is a kernel variable maintained in the system memory for each of the CPUs. The operating system uses theASID cache 1204 to assign a new ASID to a newly created memory map, or to assign a new ASID for the respective CPU to an existing memory map that was previously used on another CPU. The operating system initializes eachASID cache 1204 value to zero. Each time the instance of the operating system executing on a respective CPU assigns a new ASID value from theASID cache 1204, the operating system monotonically increments theASID cache 1204 value of the respective CPU. This process continues until theASID cache 1204 value wraps back to zero and the cycle continues. - Generally speaking, the
TLB 1202 is a small cache memory in which each entry includes a tag portion and a data portion. The tag portion includes a virtual page address, or virtual page number (VPN), portion that is concatenated with an ASID portion. When the CPU generates a virtual memory address to make a memory access, such as when a load or store instruction is executed, the virtual memory address is concatenated with the ASID of the process making the memory access, and the result is compared with theTLB 1202 tags to see if a match occurs. The ASID of the process making the memory access is supplied by theASID field 538 of theEntryHi Register 526 ofFIG. 5N of the CPU executing the process. Each time the conventional operating system schedules a process to run on a CPU, i.e., swaps the process in to the CPU, the operating system loads the ASID identifying the memory map of the thread into theEntryHi Register 526 so that the ASID of the process making the memory access is supplied by theASID field 538 of theEntryHi Register 526. If a match does not occur (a TLB miss), the CPU generates a TLB miss exception, and the operating system responsively fetches the missing page address translation information from the appropriate memory map, allocates an entry in theTLB 1202, and fills the entry with the fetched page address translation information. If a match does occur, theTLB 1202 outputs the data portion of the matching entry, which includes a physical page address, or physical frame number (PFN), and attributes of the memory page. Advantageously, because theTLB 1202 tag includes the ASID, theTLB 1202 can simultaneously cache address translations for multiple memory maps. It is noted that because each CPU in theconventional system 1200 has itsown ASID cache 1204, the ASID name spaces of each of the CPUs overlap. However, this overlap of the ASID name space in theconventional system 1200 functions properly since each CPU in thesystem 1200 has itsown TLB 1202. However, as discussed below, the present invention modifies the operating system to employ acommon ASID cache 1304 ofFIG. 13 since the CPUs (thread contexts 104) share acommon TLB 1302 in thesystem 100 of the present invention. - In the
prior art system 1200 ofFIG. 12 , each CPU comprises the entire architectural state of a MIPS Architecture processor, and in particular, includes all the state expected by a conventional SMP operating system, such as SMP Linux for MIPS, to be a MIPS CPU. In other words, the operating system views thesystem 1200 ofFIG. 12 as having a number of CPUs equal to the number of actual full architectural state CPUs, which inFIG. 12 is four. In contrast, the operating system views thesystem 100 ofFIG. 13 of the present invention as having a number of CPUs equal to the number ofthread contexts 104, which inFIG. 13 is M+1, each of which is a lightweight, highly scalable set of state that comprises far less than the full architectural state of a MIPS Architecture processor. - Referring now to
FIG. 13 , a block diagram illustrating amultiprocessor system 100 according to the present invention is shown. Themultiprocessor system 100 ofFIG. 13 is similar to themultiprocessor system 100 ofFIG. 1 ; however, the operating system running on thesystem 100 ofFIG. 13 views eachthread context 104 as a separate CPU, or processor. This is in contrast to theconventional system 1200 ofFIG. 12 , and is also in contrast to a MIPS MT ASE processor-based system in which the operating system is configured to view eachVPE 102 as a CPU. - The
system 100 ofFIG. 13 includes a plurality ofthread contexts 104, denotedTC 0 104 throughTC M 104. Thesystem 100 includes a plurality ofVPEs 102 denotedVPE 0 102 throughVPE N 102. EachTC 104 includes aTCStatus register 508 ofFIG. 5J , aTCBind register 556 ofFIG. 5K , and aTCContext register 595 ofFIG. 5L . EachVPE 102 includes aStatus Register 571 ofFIG. 5M , aContext register 527 ofFIG. 5N , aCause Register 536 ofFIG. 5P , and anEntryHi Register 526 ofFIG. 5N . Thethread contexts 104 andVPEs 102 of thesystem 100 comprise more state than shown inFIG. 13 , an in particular, include all the state as described above with respect toFIGS. 1 through 11 ; however, the state shown inFIG. 13 is included for its relevance to the remaining Figures. - The
system 100 ofFIG. 13 also includes aTLB 1302,ASID cache 1304, andFPU 1306 that are shared by all of thethread contexts 104 in thesystem 100. Additionally, as described in detail above,multiple thread contexts 104 bound to aVPE 102 share interrupt control logic with the VPE's 102 exception domain. Consequently, conventional MP operating systems, such as Linux SMP, must be modified according to the present invention to accommodate the sharing of theTLB 1302,ASID cache 1304, interrupt control logic, andFPU 1306 by themultiple thread contexts 104, as described herein. Embodiments are contemplated in whichmultiple FPU contexts 1306 are shared among the CPUs/TCs 104. One embodiment of the sharedTLB 1302 is described in co-pending U.S. patent application Ser. No. 11/075,041 (MIPS.0203-00-US), having a common assignee, which is hereby incorporated by reference in its entirety. SMTC Linux sets theSTLB bit 511 of theMVPControl Register 501 ofFIG. 5B to enable all of theVPEs 102 to share theTLB 1302. Other embodiments are contemplated in which aTLB 1302 is present for eachVPE 102 and theTLB 1302 is shared by all of thethread contexts 104 bound to theVPE 102. In contrast to thesystem 1200 ofFIG. 12 , when a processor or thread performs a memory access, the ASID of the thread making the memory access is supplied by theTASID field 528 of theTCStatus Register 508 of thethread context 104 executing the thread, rather than by theASID field 538 of theEntryHi Register 526, since theEntryHi Register 526 ofFIG. 5N is only instantiated on a per-VPE 102 basis, not a per-TC 104 basis. Each time the SMTC-aware operating system schedules a thread to run on a CPU/TC 104, i.e., swaps the process in to the CPU/TC 104, the operating system loads the ASID identifying the memory map of the thread into theTASID field 528 of theTCStatus Register 508 of thethread context 104 so that the ASID of the process making the memory access is supplied by theTASID field 528. In one embodiment, the operating system writes the ASID into theASID field 538 of theEntryHi Register 526, which propagates through to theTASID field 528. - Each of the CPUs in the
system 1200 ofFIG. 12 executes an instance of the Linux kernel and has a distinct value being returned from the smp_processor_id( ) function that can be used to access facilities that are instantiated for each CPU, such as local run queues and inter-processor interrupts. Similarly, eachthread context 104 in thesystem 100 ofFIG. 13 executes an instance of the SMTC Linux kernel and has a distinct value being returned from the smp_processor_id( ) function that can be used to access facilities that are instantiated for each CPU, such as local run queues and inter-processor interrupts. That is, eachthread context 104 comprises a set of hardware storage elements that store sufficient state to execute a Linux thread, either a thread of the operating system or a user thread. In addition, thesystem 1200 ofFIG. 12 includes one of the CPUs which is designated the first, or primary, Linux CPU that is used during the SMP Linux for MIPS boot sequence to perform low-level, system wide initialization, and contrive for all other CPUs to begin executing their instances of the Linux kernel at the SMP start_secondary( ) entry point. Similarly, thesystem 100 ofFIG. 13 includes one of thethread contexts 104, namely thethread context 104 which has a value of zero in theCurTC field 557 of theTCBind Register 556 ofFIG. 5K , which is designated the primary Linux CPU, is used during the SMTC Linux boot sequence to perform low-level, system wide initialization, and contrive for all other CPUs/TCs 104 to begin executing their instances of the Linux kernel at the SMP start_secondary( ) entry point. In particular, each CPU/TC 104 executes an instance of the SMP Linux process scheduler which schedules the processes, or threads, to execute on the CPU/TC 104. That is, each instance of the process scheduler determines the particular thread that will be allowed to employ thethread context 104 resources (e.g.,program counter 222, general purpose registers 224, integer multiplier, etc) to execute the thread during a particular time slice. In one embodiment, the Linux process scheduler running on each CPU/TC 104 maintains its own run queue of threads to execute. Still further, each CPU in thesystem 1200 ofFIG. 12 has an entry in the SMP Linux for MIPS cpu_data array, anentry 1408 of which is shown inFIG. 14 . Similarly, eachthread context 104 in thesystem 100 ofFIG. 13 has anentry 1408 in the SMTC Linux cpu_data array. - Referring now to
FIG. 14 , a block diagram of acpu_data array entry 1408 in an SMTC Linux operating system according to the present invention is shown. The conventional SMP Linux operating system maintains a cpu_data array that includes one entry for each CPU recognized by SMP Linux. The array is indexed by a CPU number assigned to each individual CPU. Each entry stores information, referred to inFIG. 14 asoriginal fields 1402, about the CPU, such as the CPU type, information about theFPU 1306, the size of theTLB 1302, pre-emption timer-related information, and cache-related information. Theoriginal fields 1402 of conventional SMP Linux also include theASID cache 1204 for each CPU, denoted asid_cache in the source code listing at line 447. As discussed below with respect toFIG. 21 , although SMTC Linux shares asingle ASID cache 1304 among all CPUs/TCs 104 of thesystem 100 ofFIG. 13 , in one embodiment SMTC Linux uses the asid_cache storage space in theoriginal fields 1402 effectively as asingle ASID cache 1304 by updating each asid_cache field in eachcpu_data array entry 1408 even when generating a new ASID value for only a single CPU/TC 104. TheSMTC Linux entry 1408 includes two additional fields: the TC_ID field 1404 and theVPE ID field 1406. The TC_ID field 1404 identifies thethread context 104 of the Linux CPU associated with thecpu_data entry 1408. In particular, the operating system populates the TC_ID field 1404 with the value stored in theCurTC field 557 of theTCBind Register 556 ofFIG. 5K of thethread context 104. The value used to index the cpu_data array is referred to as the CPU number. TheVPE_ID field 1406 identifies theVPE 102 to which is bound thethread context 104 of the Linux CPU associated with thecpu_data entry 1408. In particular, the operating system populates theVPE_ID field 1406 with the value stored in theCurVPE field 558 of theTCBind Register 556 ofFIG. 5K of thethread context 104. - Referring now to
FIG. 15 , a flowchart illustrating operation of the SMTC operating system on asystem 100 ofFIG. 13 according to the present invention is shown. The flowchart illustrates modifications to the conventional SMP Linux to accommodate the fact that thethread contexts 104 share common resources of the system, such as theFPU 1306,TLB 1302, and caches. Flow begins atblock 1502. - At
block 1502, the operating system begins its initialization sequence. Flow proceeds to block 1504. - At
block 1504, the initialization sequence invokes the SMP Linux cpu_probe_( ) routine only forTC 0 104, which corresponds to SMTC Linux CPU number 0 (the primary, or boot, CPU/TC 104), in order to populate thecpu_data array entry 1408 atindex 0. Flow proceeds to block 1506. - At
block 1506, the initialization sequence copies thecpu_data array entry 1408 atindex 0 to all the other entries in the cpu_data array, i.e., to the entry for each of the other CPUs/TCs 104. Flow proceeds to block 1508. - At
block 1508, the initialization sequence updates the TC_ID field 1404 andVPE_ID field 1406 of thecpu_data array entry 1408 for each of the CPUs/TCs 104 based on theirCurTC field 557 andCurVPE field 558 values, respectively. It is noted that prior to the step atblock 1508, the binding ofthread contexts 104 toVPEs 102 has been performed, i.e., theCurVPE field 558 for eachthread context 104 has been populated. In one embodiment, the operating system performs the binding ofthread contexts 104 toVPEs 102. In another embodiment, the binding ofthread contexts 104 - Referring now to
FIG. 17 , three flowcharts illustrating operation of the SMTC operating system on asystem 100 ofFIG. 13 according to the present invention are shown. The flowcharts illustrate modifications to the conventional SMP Linux interrupt enable and interrupt disable routines to accommodate the fact that although eachthread context 104 is a Linux CPU, the interrupt control logic is not replicated for eachthread context 104, i.e., eachthread context 104 does not have its own interrupt control logic and is thus not its own exception domain; rather, each thread context's 104 exception domain is theVPE 102 to which thethread context 104 is bound, i.e., eachVPE 102 comprises interrupt control logic that is a resource shared by each of thethread contexts 104 bound to theVPE 102, as indicated by theCurVPE bits 558 of theTCBind Register 556 ofFIG. 5K . Flow begins atblock 1702. - At
block 1702, the operating system begins its initialization sequence. Flow proceeds to block 1704. - At
block 1704, the operating system sets theIE bit 577 in theStatus Register 571 ofFIG. 5M in order to enable interrupts globally for allthread contexts 104 of theVPE 102. The operating system performs the step atblock 1704 near the end of its initialization sequence, in particular, after each of the interrupt service routines have been set up and the operating system is ready to begin servicing interrupts. Flow ends atblock 1704. - Flow of the second flowchart of
FIG. 17 begins atblock 1712. - At
block 1712, a thread executing on athread context 104 invokes an interrupt disable routine, such as the CLI macro at source code lines 215-250, on a CPU/TC 104 executing the thread. Flow proceeds to block 1714. - At
block 1714, the interrupt disable routine sets theIXMT bit 518 of theTCStatus Register 508 ofFIG. 5J of thethread context 104 executing the thread, such as is performed in the source code lines 233-240. Advantageously, this disables interrupts only for the CPU/TC 104 executing the interrupt disable routine, rather than for allthread contexts 104 of theVPE 102. Flow ends atblock 1714. - Flow of the third flowchart of
FIG. 16 begins atblock 1722. toVPEs 102 may be performed when themicroprocessor 100 is synthesized or fabricated. Additionally, the initialization sequence updates thecpu_data array entry 1408 for each of the CPUs/TCs 104 to indicate whether it has permission to access theFPU 1306. TheTCU1 bit 581 of theTCStatus Register 508 ofFIG. 5J indicates whether a CPU/TC 104 has permission to access theFPU 1306. It is noted that only a single invocation of the cpu_probe( ) routine is necessary since each of the CPUs/TCs 104 share the same set of resources, namely theFPU 1306,TLB 1302, and caches. Flow proceeds to block 1512. - At
block 1512, the initialization sequence invokes the per_cpu_trap_init( ) routine only for onethread context 104 for eachVPE 102 since theVPE 102 is the exception domain for thethread contexts 104 bound to it; that is, eachthread context 104 is not its own exception domain, particularly since asynchronous exceptions may not be directed specifically to aparticular thread context 104, as discussed above. This is in contrast to conventional SMP Linux in which the per_cpu_trap_init( ) routine is invoked once per CPU, since each CPU in theconventional system 1200 is an exception domain. Flow ends atblock 1512. - Referring now to
FIG. 16 , two flowcharts illustrating operation of the SMTC operating system on asystem 100 ofFIG. 13 according to the present invention are shown. The flowcharts illustrate modifications to the conventional SMP Linux to accommodate the sharing of theFPU 1306 by thethread contexts 104 of thesystem 100 ofFIG. 13 . Flow begins atblock 1602. - At
block 1602, a thread executing on one of thethread contexts 104 includes a floating point instruction. However, thethread context 104 does not have permission to access theFPU 1306. Therefore, a floating point exception is taken so that a floating point instruction emulation may be performed. Flow proceeds to block 1604. - At
block 1604, the operating system increments a count associated with the thread for which the floating point emulation was performed. Flow proceeds todecision block 1606. - At
decision block 1606, the operating system determines whether the count has exceeded a threshold parameter. If not, flow ends; otherwise, flow proceeds to block 1608. - At
block 1608, the operating system sets a cpus_allowed mask, which is a kernel variable, to cause the operating system to schedule the thread on athread context 104 that has permission to access theFPU 1306 during a subsequent time slice. A time slice is a time quantum used by the operating system to schedule processes or threads and is typically an integer multiple of the timer interrupt time quantum. Flow ends atblock 1608. - Flow of the second flowchart of
FIG. 16 begins atblock 1612. - At
block 1612, a time slice of the operating system completes and the operating system performs its thread scheduling. Flow proceeds todecision block 1614. - At
decision block 1614, for each running thread, the operating system determines whether the thread executed any floating point instructions during the time slice. In one embodiment, the thread has not executed any floating point instructions during the time slice if theCU1 bit 572 in theStatus Register 571 ofFIG. 5M is clear. If the thread has executed any floating point instructions during the time slice, flow ends; otherwise, flow proceeds to block 1616. - At
block 1616, the operating system clears the cpus_allowed mask to enable the operating system to schedule the thread on athread context 104 that does not have permission to access theFPU 1306 during a subsequent time slice. Flow ends atblock 1616. - Advantageously, the method described in the flowcharts of
FIG. 16 provides less variability in the execution times of floating-point intensive programs in anSMTC system 100. It is noted an alternative to the operation ofFIG. 16 is to allow the SMP Linux cpu_has_fpu macro to evaluate true only for one CPU/TC 104. However, this alternative would cause extreme variability in the execution times of floating point-intensive programs, depending upon the percentage of their execution time that is scheduled by the operating system on athread context 104 that does not have permission to access theFPU 1306. - At
block 1722, a thread executing on athread context 104 invokes an interrupt enable routine, for example a Linux STI macro, on a CPU/TC 104 executing the thread. Flow proceeds to block 1724. - At
block 1724, the interrupt enable routine clears theIXMT bit 518 of theTCStatus Register 508 ofFIG. 5J of thethread context 104 executing the thread, similar to, but an inverse operation of, the instructions in the CLI macro. Advantageously, this enables interrupts only for the CPU/TC 104 executing the interrupt enable routine, rather than for allthread contexts 104 of theVPE 102. Flow ends atblock 1724. - Referring now to
FIG. 18 , a flowchart illustrating operation of the SMTC operating system on asystem 100 ofFIG. 13 according to the present invention is shown. The flowchart ofFIG. 18 illustrates modifications to the conventional SMP Linux general interrupt vector and common return from interrupt code to accommodate the fact that although eachthread context 104 is a Linux CPU, eachthread context 104 is not its own exception domain, but rather each thread context's 104 exception domain is theVPE 102 to which thethread context 104 is bound. In particular, the modifications advantageously prevent the undesirable situation in whichmultiple thread contexts 104 of aVPE 102 would otherwise service the same interrupt request instance. Flow begins atblock 1802. - At
block 1802, an interrupt request is activated. In response, theVPE 102 receiving the interrupt request sets theEXL bit 576 in theStatus Register 571 ofFIG. 5M , which has the effect of disabling theVPE 102 from taking subsequent interrupts. Setting theEXL bit 576 also has the advantageous effect of suspending theinstruction scheduler 216 from issuing for execution instructions of the variousother thread contexts 104 of theVPE 102 taking the interrupt request. TheVPE 102 then selects aneligible thread context 104 to service the interrupt request and causes the general interrupt vector code to commence running on the selectedthread context 104. Flow proceeds to block 1804. - At
block 1804, the interrupt vector code saves the contents of theCause Register 536 ofFIG. 5P to theTCContext Register 595 ofFIG. 5L of thethread context 104 executing the interrupt vector code. TheIP bits 547/548 of theCause Register 536 ofFIG. 5P indicate which interrupt request sources are currently active. In an alternate embodiment, the interrupt vector code saves the contents of theCause Register 536 to an entry in a table similar to the page table origin or kernel stack pointer tables ofFIG. 19 that is indexed by a shifted version of theTCBind Register 556 ofFIG. 5K , as described below with respect toFIG. 19 . Flow proceeds to block 1806. - At
block 1806, the interrupt vector code masks off the currently active interrupt sources indicated in theCause Register 536 by setting thecorresponding IM bits 573 in theStatus Register 571 ofFIG. 5M of theVPE 102. Flow proceeds to block 1808. - At
block 1808, the interrupt vector code clears theEXL bit 576, which ceases to disable theVPE 102 from taking interrupts which were activated atblock 1802. Flow proceeds to block 1812. - At
block 1812, the interrupt vector code decodes the interrupt sources based on theCause Register 536 contents and transfers control to the appropriate interrupt handlers registered to handle interrupts for the specific types of active interrupt sources. Flow proceeds to block 1814. - At
block 1814, the interrupt source-specific interrupt handler clears the interrupt source and services the interrupt source. Flow proceeds to block 1816. - At
block 1816, the interrupt handler invokes the common return from interrupt code to restore the context and return from the interrupt. Flow proceeds to block 1818. - At
block 1818, the return from interrupt code reads theTCContext Register 595 and unmasks the interrupt sources indicated therein as previously having been inactive by clearing the correspondingIM bits 573 in theStatus Register 571. Flow ends atblock 1818. - It is noted that a kernel variable in memory could be used instead of the
TCContext Register 595 to save theCause Register 536 contents. However, using theTCContext Register 595 is more efficient, and is particularly appropriate in an embodiment in which the value must be saved and restored on a context switch. - In addition to the modifications described in
FIG. 18 , SMTC Linux also provides an SMTC-specific setup_irq( ) routine that SMTC-aware device drivers may invoke to set up their interrupt handlers by passing an additional mask parameter that specifies interrupt sources that the interrupt handler will re-enable explicitly during the servicing of the exception. In particular, the clock timer device driver in SMTC Linux is SMTC-aware and invokes the SMTC-specific setup_irq( ) routine. - Referring now to
FIG. 19 , two flowcharts and two block diagrams illustrating operation of the SMTC operating system on asystem 100 ofFIG. 13 according to the present invention are shown. The flowcharts and block diagrams ofFIG. 19 illustrate modifications to the conventional SMP Linux TLB miss handler, get_kernel_sp( ), and set_kernel_sp( ) routines, to accommodate the fact that theContext Register 527 ofFIG. 5N , used by the conventional SMP Linux TLB miss handler get_kernel_sp( ), and set_kernel_sp( ) routines, is instantiated on a per-VPE 102 basis, rather than a per-TC 104 basis. Flow begins atblock 1902. - At
block 1902, theVPE 102 invokes the operating system TLB miss handler in response to a TLB miss exception. It is noted that in a MIPS Architecture processor, the operating system is responsible for handlingTLB 1302 misses. That is, the operating system is responsible for updating theTLB 1302 with the appropriate virtual to physical page translation information if the information is missing in theTLB 1302. This is in contrast to some processor architectures in which the processor hardware automatically fills the TLB on a TLB miss. Flow proceeds to block 1904. - At
block 1904, the TLB miss handler reads theTCBind Register 556 ofFIG. 5K of the exception causing thread context 104 (which theVPE 102 selects to service the TLB miss exception) and shifts the value right by 19 bits (or 18 bits if dealing with 64-bit quantities) to obtain an offset into a table of 32-bit page table origin values, or page table base address values, and adds the offset to the base address of the table to obtain a pointer to the page table origin of thethread context 104 executing the thread that caused the TLB miss exception, as shown in the corresponding block diagram. In one embodiment, the base address of the table is fixed at compile time of the operating system. Flow ends atblock 1904. - Flow of the second flowchart of
FIG. 19 begins atblock 1912. - At
block 1912, a thread invokes the operating system get_kernel_sp( ) or set_kernel_sp( ) routine to get or set, respectively, the kernel stack pointer value for the CPU/TC 104 executing the thread. Flow proceeds to block 1914. - At
block 1914, the invoked routine reads theTCBind Register 556 ofFIG. 5K of the invokingthread context 104 and shifts the value right by 19 bits (or 18 bits if dealing with 64-bit quantities) to obtain an offset into a table of 32-bit kernel stack pointer values, and adds the offset to the base address of the table to obtain a pointer to the kernel stack pointer, as shown in the corresponding block diagram. In one embodiment, the base address of the table is fixed at compile time of the operating system. Flow ends atblock 1914. - It is noted that conventional SMP Linux for MIPS uses the
PTEBase field 542 of theCoprocessor 0Context Register 527 ofFIG. 5N to store a value that may be used as a pointer to CPU-unique values in a system such as thesystem 1200 ofFIG. 12 . However, SMTC operating systems require a per-TC storage location such asTCBind 556 which is provided insystem 100 ofFIG. 13 for eachthread context 104, rather than a per-VPE 102 storage location, since SMTC operating systems view eachthread context 104 as a CPU. - Referring now to
FIG. 20 , a flowchart illustrating operation of the SMTC operating system on asystem 100 ofFIG. 13 according to the present invention is shown. The flowchart illustrates modifications to the conventional SMP Linux to accommodate the fact that thethread contexts 104 share acommon TLB 1302. In particular,TLB 1302 maintenance routines may read and write entries in the sharedTLB 1302; therefore, the operating system prevents multiple CPU/TCs 104 from maintaining the sharedTLB 1302 at the same time. In particular, the second-level TLB page fault handler performs a TLB probe and re-write sequence and may be invoked at any time due to a user-mode access. Consequently, a software spin-lock is an insufficient arbiter of access to theTLB 1302 management resources. Flow begins atblock 2002. - At
block 2002, a thread executing on a CPU/TC 104 invokes aTLB 1302 maintenance routine. Flow proceeds to block 2004. - At
block 2004, the routine disables interrupts. In one embodiment, the routine disables interrupts only on the executingthread context 104, such as via a CLI described above. In another embodiment, the routine disables interrupts on theentire VPE 102 to which thethread context 104 is bound by clearing theIE bit 577 of theStatus Register 571 ofFIG. 5M to disableVPE 102 interrupts. Flow proceeds to block 2006. - At
block 2006, the routine inhibits multi-VPE 102 operation, i.e., inhibits concurrent execution of threads other than the thread executing the routine. That is, the routine prevents theinstruction scheduler 216 from dispatching to theexecution units 212 instructions from any of theVPEs 102 of thesystem 100 other than theVPE 102 to which thethread context 104 executing the routine is bound and from dispatching from any of thethread contexts 104 bound to theVPE 102 except thethread context 104 executing the routine. In one embodiment, the routine executes a MIPS MT ASE DVPE instruction to disable multi-VPE operation. Flow proceeds to block 2008. - At
block 2008, the routine performs the specifiedTLB 1302 maintenance required by theTLB 1302 maintenance routine. Flow proceeds to block 2012. - At
block 2012, the routine restores the multi-VPE operation state that existed on thesystem 100 prior to performing the step atblock 2006. In one embodiment, the routine executes a MIPS MT ASE EVPE instruction to enable multi-VPE operation if that was the previous state. Flow proceeds to block 2014. - At
block 2014, the routine restores the interrupt enable state that existed on theVPE 102 prior to performing the step atblock 2004. In one embodiment, the routine clears theIXMT bit 518 in theTCStatus Register 508 ofFIG. 5J to enable interrupts for thethread context 104 if that was the previous state. In another embodiment, the routine sets theIE bit 577 in theStatus Register 571 ofFIG. 5M to enableVPE 102 interrupts if that was the previous state. Flow ends atblock 2014. - Referring now to
FIG. 21 , a flowchart illustrating operation of the SMTC operating system on asystem 100 ofFIG. 13 according to the present invention is shown. The flowchart illustrates modifications to conventional SMP Linux to accommodate the fact that thethread contexts 104 share acommon ASID cache 1304. - As mentioned above, in a conventional MIPS
SMP Linux system 1200 ofFIG. 12 , each CPU has itsown TLB 1202 and itsown ASID cache 1204; however, in anSMTC Linux system 100, all of the CPUs/TCs 104 share acommon TLB 1302. Therefore, SMTC Linux must ensure that the same ASID is not assigned to two different memory maps concurrently in use on two different CPUs/TCs 104. Otherwise, the sharedTLB 1302 might return the incorrect address translation information for the thread executing on one of the CPUs/TCs 104. This is because, as discussed above, the tags in theTLB 1302 are a concatenation of the ASID and the virtual page number being accessed. Thus, if two different threads running on two different CPUs/TCs 104 using two different memory maps generated the same virtual page address and same ASID, then they would match the same entry in theTLB 1302 and receive the same physical page address; however, this is incorrect since they are using different memory maps, which would be accessing different physical pages. In other words, when the second thread accessed theTLB 1302, theTLB 1302 would return a hit and output the physical page translation for the memory map of the first thread, since the entry would have been allocated and filled when the first thread caused aTLB 1302 miss. - To ensure that the same ASID is not assigned to two different memory maps concurrently in use on two different CPUs/
TCs 104, SMTC Linux shares acommon ASID cache 1304 across all CPUs/TCs 104, and serializes use and update of the sharedASID cache 1304 by suspending thread scheduling during the read-modify-write operation of theASID cache 1304 that is performed when obtaining a new ASID value from theASID cache 1304. Flow begins atblock 2102. - At
block 2102, a thread executing on athread context 104 requires a new ASID for a memory map for a particular CPU/TC 104. The most common situations in which a new ASID is required for a memory map are when a new memory map is being created or when an ASID generation rollover occurs, as described below. In particular, a thread is being scheduled to run on a CPU/TC 104, i.e., the thread is being swapped in to the CPU/TC 104 by the operating system. Among other things, the operating system loads the general purpose registers 224 ofFIG. 2 with the previously saved orinitial GPR 224 values and loads theprogram counter 222 ofFIG. 2 of the CPU/TC 104 with the previously saved or initial address of the thread. Furthermore, the operating system looks at which process is associated with the thread being schedule and which memory map is associated with the process. The operating system data structure describing the memory map contains an array of ASID values. Normally, the operating system takes the ASID value from the data structure entry indexed by the CPU number of the CPU/TC 104 scheduling the thread and loads the ASID value into theEntryHi Register 526 ofFIG. 5N . However, if the operating system detects that the ASID value obtained from the data structure entry belongs to a previous generation, then the operating system obtains a new ASID for the memory map for the CPU/TC 104 according toFIG. 21 , and programs theEntryHi Register 526 with the new ASID instead of the ASID obtained from the data structure. Flow proceeds to block 2104. - At
block 2104, the operating system gains exclusive access to the sharedASID cache 1304. In one embodiment, the step atblock 2104 is performed by disabling interrupts and disabling multi-VPE operation as described with respect toblocks FIG. 20 . An example of the step performed atblock 2104 is found at lines 274-281 of the source code listing. Flow proceeds to block 2106. - At
block 2106, the operating system increments thecurrent ASID cache 1304 value to obtain the new ASID value. An example of the step performed atblock 2106 is found at lines 282 and 285 of the source code listing. Flow proceeds todecision block 2108. - At
decision block 2108, the operating system determines whether theASID cache 1304 value rolled over to a new generation when it was incremented atblock 2106. TheASID cache 1304 rolls over to a new generation as follows. TheASID cache 1304 value is maintained as a 32-bit value. However, theTASID bits 528 of theTCStatus Register 508 ofFIG. 5J and theASID bits 538 of theCoprocessor 0EntryHi Register 526 ofFIG. 5N are physically only 8 bits. When the 32-bit ASID cache 1304 value is incremented to a new value that modulo 256 is zero, an ASID generation rollover has occurred, since the new 8-bit ASID physical values written to theTASID bits 528 and theASID bits 538 will be of a new ASID generation. That is, the 8-bit physical ASID values are re-used for each possible value of the upper 24 bits of a 32-bit ASID value. However, the same physical ASID value may not be used to identify two different memory maps, or else theTLB 1302 will produce incorrect page translations, as discussed above. Therefore, the operating system performs the ASID generation rollover condition check. An example of the step performed atdecision block 2108 is found at line 285 of the source code listing. If theASID cache 1304 value rolled over, flow proceeds to block 2112; otherwise, flow proceeds todecision block 2114. - At
block 2112, the operating system updates a live ASID table. In addition, when an ASID generation rollover occurs, the operating system updates the new ASID to the first ASID generation value and flushes the sharedTLB 1302. A live ASID is an ASID that is in use by another CPU/TC 104. The live ASID table indicates, for each ASID, which CPUs/TCs 104, if any, are currently using the ASID. The operating system updates the live ASID table by reading theTASID field 528 of theTCStatus Register 508 ofFIG. 5J to determine the ASID currently being used by each CPU/TC 104, which may advantageously be performed by a series ofMFTR instructions 300 in the operating system thread that updates the live ASID table. The operating system avoids obtaining a new ASID that is the same as a live ASID in order to avoid potentially using the same physical ASID value to identify two different memory maps, which might cause theTLB 1302 to produce incorrect page translations, as discussed above. In particular, although the operating system flushes the sharedTLB 1302 when an ASID generation rollover occurs, theTASID field 528 of theTCStatus Register 508 of thevarious thread contexts 104 may still be populated with old generation ASIDs, and could therefore generatenew TLB 1302 entry allocations/fills that have old generation ASIDs in their tags. An example of the step of updating the live ASID table performed atblock 2112 is found at lines 304-305 of the source code listing. An example of the step of updating the new ASID to the first ASID generation value performed atblock 2112 is found at line 310 of the source code listing. An example of the step of flushing the sharedTLB 1302 performed atblock 2112 is found at line 311 of the source code listing. Flow proceeds to block 2116. - At
decision block 2114, the operating system determines whether the new ASID is equal to a live ASID. An example of the step performed atdecision block 2114 is found at line 313 of the source code listing. If the new ASID is equal to a live ASID, flow returns to block 2106 so that the operating system can attempt to obtain a new non-live ASID; otherwise, flow proceeds to block 2116. - At
block 2116, the operating system assigns the new ASID to the memory map for all CPUs/TCs 104 in thesystem 100. As discussed above, in one embodiment, SMTC Linux uses the asid_cache storage space in theoriginal fields 1402 effectively as asingle ASID cache 1304 by updating each asid_cache field in eachcpu_data array entry 1408 even when generating a new ASID value for only a single CPU/TC 104; however, other embodiments are contemplated in which a single kernel variable is used to store thesingle ASID cache 1304. The operating system advantageously assigns the new ASID to the memory map for all CPUs/TCs 104 in order to make more efficient use of the sharedTLB 1302, i.e., to avoid the following situation. Assume two processes share a common memory map and execute on different CPUs/TCs 104. In a conventionalSMP Linux system 1200, the memory map would be assigned a different ASID for each CPU, since each CPU has itsown ASID cache 1204. However, in the sharedTLB 1302system 100, the first time each CPU/TC 104 accessed a shared memory page, the operating system would allocate an entry in the sharedTLB 1302 for the page translation since the ASID value differed for each CPU/TC 104, i.e., twoTLB 1302 entries would be consumed for the same shared physical page, which would be an inefficient use of the sharedTLB 1302 entries. A similar inefficiency could occur when a process was migrated from one CPU/TC 104 to another. Thus, to avoid this situation and make more efficient use of the sharedTLB 1302, SMTC Linux assigns the new ASID to the memory map not only for the CPU/TC 104 for which it was obtained, but also causes the new ASID to be assigned to and used by all CPUs/TCs 104 that reference the memory map. Stated alternatively, when the operating system assigns a new ASID to a memory map, if a process uses the memory map, then all threads of the process which use the memory map use the new ASID on all CPUs/TCs 104 that execute the threads. In particular, when a thread using a memory map is swapped into anythread context 104 after a new ASID is assigned to the memory map, the new ASID, rather than an old ASID identifying the memory map, gets loaded into theTASID field 528 of theTCStatus Register 508 ofFIG. 5J of thethread context 104. Thus advantageously, anyTLB 1302 entries that were loaded as a result of the thread executing on one CPU/TC 104 will be valid and usable on any other CPU/TC 104 to which the thread subsequently migrates, which would not be the case if the operating system maintained a distinct ASID cache per CPU, as in conventional SMP Linux. An example of the step performed atblock 2116 is found at line 320 of the source code listing. Flow proceeds to block 2118. - At
block 2118, the operating system relinquishes exclusive access to the sharedASID cache 1304. In one embodiment, the step atblock 2118 is performed by restoring interrupts and multi-VPE operation to their previous states, as described with respect toblocks FIG. 20 . An example of the step performed atblock 2118 is found at lines 324-329 of the source code listing. Flow ends atblock 2118. - Although the present invention and its objects, features, and advantages have been described in detail, other embodiments are encompassed by the invention. For example, although embodiments have been described in which the modified SMP OS is Linux, other SMP operating systems are contemplated for adaptation to run on a multithreading microprocessor having non-independent lightweight thread contexts that share processor state with one another, such as MIPS MT ASE thread contexts, each of which is an independent CPU to the SMP OS. For example, other variants of the UNIX operating system, such as SUN Solaris, HP UX, Mac OS X, Open VMS, and others may be adapted to view the thread contexts as a CPU. Still further, other SMP operating systems such as SMP-capable variants of the Microsoft Windows operating system may be adapted to view the thread contexts as a CPU. Furthermore, although the invention has been described with respect to modifications to an existing SMP operating system, the invention is not limited to existing operating systems, but rather new operating systems may be developed which employ the steps described to employ non-independent lightweight thread contexts that share processor state with one another, such as MIPS MT ASE thread contexts, as independent CPUs to the new SMP OS.
- While various embodiments of the present invention have been described above, it should be understood that they have been presented by way of example, and not limitation. It will be apparent to persons skilled in the relevant computer arts that various changes in form and detail can be made therein without departing from the scope of the invention. For example, in addition to using hardware (e.g., within or coupled to a Central Processing Unit (“CPU”) , microprocessor, microcontroller, digital signal processor, processor core, System on Chip (“SOC”) , or any other device), implementations may also be embodied in software (e.g., computer readable code, program code, instructions and/or data disposed in any form, such as source, object or machine language) disposed, for example, in a computer usable (e.g., readable) medium configured to store the software. Such software can enable, for example, the function, fabrication, modeling, simulation, description and/or testing of the apparatus and methods described herein. For example, this can be accomplished through the use of general programming languages (e.g., C, C++), GDSII databases, hardware description languages (HDL) including Verilog HDL, VHDL, and so on, or other available programs and databases. Such software can be disposed in any known computer usable medium such as semiconductor, magnetic disk, or optical disc (e.g., CD-ROM, DVD-ROM, etc.). The software can also be disposed as a computer data signal embodied in a computer usable (e.g., readable) transmission medium (e.g., carrier wave or any other medium including digital, optical, or analog-based medium). Embodiments of the present invention may include methods of providing operating system software described herein by providing the software and subsequently transmitting the software as a computer data signal over a communication network including the Internet and intranets, such as shown in
FIGS. 22 through 24 . It is understood that the apparatus and method described herein may be included in a semiconductor intellectual property core, such as a microprocessor core (e.g., embodied in HDL) and transformed to hardware in the production of integrated circuits. Additionally, the apparatus and methods described herein may be embodied as a combination of hardware and software. Thus, the present invention should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.
Claims (42)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/330,916 US7870553B2 (en) | 2003-08-28 | 2006-01-11 | Symmetric multiprocessor operating system for execution on non-independent lightweight thread contexts |
Applications Claiming Priority (9)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US49918003P | 2003-08-28 | 2003-08-28 | |
US50235903P | 2003-09-12 | 2003-09-12 | |
US50235803P | 2003-09-12 | 2003-09-12 | |
US10/684,350 US7376954B2 (en) | 2003-08-28 | 2003-10-10 | Mechanisms for assuring quality of service for programs executing on a multithreaded processor |
US10/684,348 US20050050305A1 (en) | 2003-08-28 | 2003-10-10 | Integrated mechanism for suspension and deallocation of computational threads of execution in a processor |
US10/929,097 US7424599B2 (en) | 2003-08-28 | 2004-08-27 | Apparatus, method, and instruction for software management of multiple computational contexts in a multithreaded microprocessor |
US11/313,296 US9032404B2 (en) | 2003-08-28 | 2005-12-20 | Preemptive multitasking employing software emulation of directed exceptions in a multithreading processor |
US11/313,272 US7849297B2 (en) | 2003-08-28 | 2005-12-20 | Software emulation of directed exceptions in a multithreading processor |
US11/330,916 US7870553B2 (en) | 2003-08-28 | 2006-01-11 | Symmetric multiprocessor operating system for execution on non-independent lightweight thread contexts |
Related Parent Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/313,272 Continuation-In-Part US7849297B2 (en) | 2003-08-28 | 2005-12-20 | Software emulation of directed exceptions in a multithreading processor |
US11/313,296 Continuation-In-Part US9032404B2 (en) | 2003-08-28 | 2005-12-20 | Preemptive multitasking employing software emulation of directed exceptions in a multithreading processor |
Publications (3)
Publication Number | Publication Date |
---|---|
US20060190946A1 US20060190946A1 (en) | 2006-08-24 |
US20070044106A2 true US20070044106A2 (en) | 2007-02-22 |
US7870553B2 US7870553B2 (en) | 2011-01-11 |
Family
ID=46323590
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/330,916 Active 2027-08-12 US7870553B2 (en) | 2003-08-28 | 2006-01-11 | Symmetric multiprocessor operating system for execution on non-independent lightweight thread contexts |
Country Status (1)
Country | Link |
---|---|
US (1) | US7870553B2 (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050050305A1 (en) * | 2003-08-28 | 2005-03-03 | Kissell Kevin D. | Integrated mechanism for suspension and deallocation of computational threads of execution in a processor |
US20050120194A1 (en) * | 2003-08-28 | 2005-06-02 | Mips Technologies, Inc. | Apparatus, method, and instruction for initiation of concurrent instruction streams in a multithreading microprocessor |
US20050251639A1 (en) * | 2003-08-28 | 2005-11-10 | Mips Technologies, Inc. A Delaware Corporation | Smart memory based synchronization controller for a multi-threaded multiprocessor SoC |
US20050251613A1 (en) * | 2003-08-28 | 2005-11-10 | Mips Technologies, Inc., A Delaware Corporation | Synchronized storage providing multiple synchronization semantics |
US20060161421A1 (en) * | 2003-08-28 | 2006-07-20 | Mips Technologies, Inc. | Software emulation of directed exceptions in a multithreading processor |
US20060161921A1 (en) * | 2003-08-28 | 2006-07-20 | Mips Technologies, Inc. | Preemptive multitasking employing software emulation of directed exceptions in a multithreading processor |
US20060190945A1 (en) * | 2003-08-28 | 2006-08-24 | Mips Technologies, Inc. | Symmetric multiprocessor operating system for execution on non-independent lightweight thread context |
US20070106887A1 (en) * | 2003-08-28 | 2007-05-10 | Mips Technologies, Inc. | Symmetric multiprocessor operating system for execution on non-independent lightweight thread contexts |
US20090235056A1 (en) * | 2008-03-14 | 2009-09-17 | Fujitsu Limited | Recording medium storing performance monitoring program, performance monitoring method, and performance monitoring device |
US20110113220A1 (en) * | 2008-06-19 | 2011-05-12 | Hiroyuki Morishita | Multiprocessor |
US8069354B2 (en) | 2007-08-14 | 2011-11-29 | Mips Technologies, Inc. | Power management for system having one or more integrated circuits |
US20120008674A1 (en) * | 2009-02-17 | 2012-01-12 | Panasonic Corporation | Multithread processor and digital television system |
WO2014031495A3 (en) * | 2012-08-18 | 2014-07-17 | Qualcomm Technologies, Inc. | Translation look-aside buffer with prefetching |
US10951475B2 (en) * | 2019-06-28 | 2021-03-16 | Intel Corporation | Technologies for transmit scheduler dynamic configurations |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7273179B2 (en) * | 2004-07-09 | 2007-09-25 | Datalogic Scanning, Inc. | Portable data reading device with integrated web server for configuration and data extraction |
US7562179B2 (en) | 2004-07-30 | 2009-07-14 | Intel Corporation | Maintaining processor resources during architectural events |
US8402172B2 (en) * | 2006-12-22 | 2013-03-19 | Hewlett-Packard Development Company, L.P. | Processing an input/output request on a multiprocessor system |
US8347312B2 (en) * | 2007-07-06 | 2013-01-01 | Xmos Limited | Thread communications |
US20100161721A1 (en) * | 2008-08-27 | 2010-06-24 | Craig Bolon | Providing threaded context in web application software |
US8561040B2 (en) * | 2009-03-10 | 2013-10-15 | Oracle America, Inc. | One-pass compilation of virtual instructions |
US8868847B2 (en) * | 2009-03-11 | 2014-10-21 | Apple Inc. | Multi-core processor snoop filtering |
US8352946B2 (en) * | 2009-08-11 | 2013-01-08 | International Business Machines Corporation | Managing migration ready queue associated with each processor based on the migration ready status of the tasks |
GB2498484A (en) * | 2010-10-20 | 2013-07-17 | Ibm | Method for detecting access of an object, computer thereof, and computer program |
CN102520916B (en) * | 2011-11-28 | 2015-02-11 | 深圳中微电科技有限公司 | Method for eliminating texture retardation and register management in MVP (multi thread virtual pipeline) processor |
US9378069B2 (en) * | 2014-03-05 | 2016-06-28 | International Business Machines Corporation | Lock spin wait operation for multi-threaded applications in a multi-core computing environment |
CN105579963B (en) * | 2014-09-03 | 2019-10-01 | 华为技术有限公司 | Task Processing Unit, electronic equipment and method |
US11132228B2 (en) * | 2018-03-21 | 2021-09-28 | International Business Machines Corporation | SMT processor to create a virtual vector register file for a borrower thread from a number of donated vector register files |
Citations (86)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3665404A (en) * | 1970-04-09 | 1972-05-23 | Burroughs Corp | Multi-processor processing system having interprocessor interrupt apparatus |
US4817051A (en) * | 1987-07-02 | 1989-03-28 | Fairchild Semiconductor Corporation | Expandable multi-port random access memory |
US4843541A (en) * | 1987-07-29 | 1989-06-27 | International Business Machines Corporation | Logical resource partitioning of a data processing system |
US4860190A (en) * | 1985-09-03 | 1989-08-22 | Fujitsu Limited | Computer system for controlling virtual machines |
US5295265A (en) * | 1991-06-04 | 1994-03-15 | Sextant Avionique | Device for enhancing the performance of a real time executive kernel associated with a multiprocessor structure that can include a large number of processors |
US5428754A (en) * | 1988-03-23 | 1995-06-27 | 3Dlabs Ltd | Computer system with clock shared between processors executing separate instruction streams |
US5499349A (en) * | 1989-05-26 | 1996-03-12 | Massachusetts Institute Of Technology | Pipelined processor with fork, join, and start instructions using tokens to indicate the next instruction for each of multiple threads of execution |
US5511192A (en) * | 1991-11-30 | 1996-04-23 | Kabushiki Kaisha Toshiba | Method and apparatus for managing thread private data in a parallel processing computer |
US5515538A (en) * | 1992-05-29 | 1996-05-07 | Sun Microsystems, Inc. | Apparatus and method for interrupt handling in a multi-threaded operating system kernel |
US5542076A (en) * | 1991-06-14 | 1996-07-30 | Digital Equipment Corporation | Method and apparatus for adaptive interrupt servicing in data processing system |
US5606696A (en) * | 1994-09-09 | 1997-02-25 | International Business Machines Corporation | Exception handling method and apparatus for a microkernel data processing system |
US5659786A (en) * | 1992-10-19 | 1997-08-19 | International Business Machines Corporation | System and method for dynamically performing resource reconfiguration in a logically partitioned data processing system |
US5727203A (en) * | 1995-03-31 | 1998-03-10 | Sun Microsystems, Inc. | Methods and apparatus for managing a database in a distributed object operating environment using persistent and transient cache |
US5742822A (en) * | 1994-12-19 | 1998-04-21 | Nec Corporation | Multithreaded processor which dynamically discriminates a parallel execution and a sequential execution of threads |
US5758142A (en) * | 1994-05-31 | 1998-05-26 | Digital Equipment Corporation | Trainable apparatus for predicting instruction outcomes in pipelined processors |
US5790871A (en) * | 1996-05-17 | 1998-08-04 | Advanced Micro Devices | System and method for testing and debugging a multiprocessing interrupt controller |
US5799188A (en) * | 1995-12-15 | 1998-08-25 | International Business Machines Corporation | System and method for managing variable weight thread contexts in a multithreaded computer system |
US5867704A (en) * | 1995-02-24 | 1999-02-02 | Matsushita Electric Industrial Co., Ltd. | Multiprocessor system shaving processor based idle state detection and method of executing tasks in such a multiprocessor system |
US5892934A (en) * | 1996-04-02 | 1999-04-06 | Advanced Micro Devices, Inc. | Microprocessor configured to detect a branch to a DSP routine and to direct a DSP to execute said routine |
US5933627A (en) * | 1996-07-01 | 1999-08-03 | Sun Microsystems | Thread switch on blocked load or store using instruction thread field |
US5944816A (en) * | 1996-05-17 | 1999-08-31 | Advanced Micro Devices, Inc. | Microprocessor configured to execute multiple threads including interrupt service routines |
US5949994A (en) * | 1997-02-12 | 1999-09-07 | The Dow Chemical Company | Dedicated context-cycling computer with timed context |
US6061710A (en) * | 1997-10-29 | 2000-05-09 | International Business Machines Corporation | Multithreaded processor incorporating a thread latch register for interrupt service new pending threads |
US6088787A (en) * | 1998-03-30 | 2000-07-11 | Celestica International Inc. | Enhanced program counter stack for multi-tasking central processing unit |
US6175916B1 (en) * | 1997-05-06 | 2001-01-16 | Microsoft Corporation | Common-thread inter-process function calls invoked by jumps to invalid addresses |
US6189093B1 (en) * | 1998-07-21 | 2001-02-13 | Lsi Logic Corporation | System for initiating exception routine in response to memory access exception by storing exception information and exception bit within architectured register |
US6205543B1 (en) * | 1998-12-03 | 2001-03-20 | Sun Microsystems, Inc. | Efficient handling of a large register file for context switching |
US6205414B1 (en) * | 1998-10-02 | 2001-03-20 | International Business Machines Corporation | Methodology for emulation of multi-threaded processes in a single-threaded operating system |
US6223228B1 (en) * | 1998-09-17 | 2001-04-24 | Bull Hn Information Systems Inc. | Apparatus for synchronizing multiple processors in a data processing system |
US6240531B1 (en) * | 1997-09-30 | 2001-05-29 | Networks Associates Inc. | System and method for computer operating system protection |
US6253306B1 (en) * | 1998-07-29 | 2001-06-26 | Advanced Micro Devices, Inc. | Prefetch instruction mechanism for processor |
US6286027B1 (en) * | 1998-11-30 | 2001-09-04 | Lucent Technologies Inc. | Two step thread creation with register renaming |
US20020016869A1 (en) * | 2000-06-22 | 2002-02-07 | Guillaume Comeau | Data path engine |
US6401155B1 (en) * | 1998-12-22 | 2002-06-04 | Philips Electronics North America Corporation | Interrupt/software-controlled thread processing |
US20020083173A1 (en) * | 2000-02-08 | 2002-06-27 | Enrique Musoll | Method and apparatus for optimizing selection of available contexts for packet processing in multi-stream packet processing |
US20020083278A1 (en) * | 2000-12-22 | 2002-06-27 | Bull Hn Information Systems Inc. | Method and data processing system for performing atomic multiple word writes |
US20020091915A1 (en) * | 2001-01-11 | 2002-07-11 | Parady Bodo K. | Load prediction and thread identification in a multithreaded microprocessor |
US20020103847A1 (en) * | 2001-02-01 | 2002-08-01 | Hanan Potash | Efficient mechanism for inter-thread communication within a multi-threaded computer system |
US20030014471A1 (en) * | 2001-07-12 | 2003-01-16 | Nec Corporation | Multi-thread execution method and parallel processor system |
US20030018684A1 (en) * | 2001-07-18 | 2003-01-23 | Nec Corporation | Multi-thread execution method and parallel processor system |
US20030028755A1 (en) * | 2001-07-12 | 2003-02-06 | Nec Corporation | Interprocessor register succession method and device therefor |
US20030074545A1 (en) * | 2001-10-12 | 2003-04-17 | Uhler G. Michael | Method and apparatus for binding shadow registers to vectored interrupts |
US20030079094A1 (en) * | 2001-10-19 | 2003-04-24 | Ravi Rajwar | Concurrent execution of critical sections by eliding ownership of locks |
US6560626B1 (en) * | 1998-04-02 | 2003-05-06 | Microsoft Corporation | Thread interruption with minimal resource usage using an asynchronous procedure call |
US20030093652A1 (en) * | 2001-11-14 | 2003-05-15 | Song Seungyoon Peter | Operand file using pointers and reference counters and a method of use |
US20030105796A1 (en) * | 2001-12-05 | 2003-06-05 | Sandri Jason G. | Method and apparatus for controlling access to shared resources in an environment with multiple logical processors |
US20030115245A1 (en) * | 2001-12-17 | 2003-06-19 | Kunimasa Fujisawa | Multi-application execution system and method thereof |
US20030126416A1 (en) * | 2001-12-31 | 2003-07-03 | Marr Deborah T. | Suspending execution of a thread in a multi-threaded processor |
US6591379B1 (en) * | 2000-06-23 | 2003-07-08 | Microsoft Corporation | Method and system for injecting an exception to recover unsaved data |
US6675192B2 (en) * | 1999-10-01 | 2004-01-06 | Hewlett-Packard Development Company, L.P. | Temporary halting of thread execution until monitoring of armed events to memory location identified in working registers |
US20040015684A1 (en) * | 2002-05-30 | 2004-01-22 | International Business Machines Corporation | Method, apparatus and computer program product for scheduling multiple threads for a processor |
US6687812B1 (en) * | 1999-04-20 | 2004-02-03 | Nec Corporation | Parallel processing apparatus |
US6697935B1 (en) * | 1997-10-23 | 2004-02-24 | International Business Machines Corporation | Method and apparatus for selecting thread switch events in a multithreaded processor |
US20040073910A1 (en) * | 2002-10-15 | 2004-04-15 | Erdem Hokenek | Method and apparatus for high speed cross-thread interrupts in a multithreaded processor |
US6738796B1 (en) * | 1999-10-08 | 2004-05-18 | Globespanvirata, Inc. | Optimization of memory requirements for multi-threaded operating systems |
US20040139306A1 (en) * | 2003-01-09 | 2004-07-15 | Sony Corporation | Partial and start-over threads in embedded real-time kernel |
US6779065B2 (en) * | 2001-08-31 | 2004-08-17 | Intel Corporation | Mechanism for interrupt handling in computer systems that support concurrent execution of multiple threads |
US20050050395A1 (en) * | 2003-08-28 | 2005-03-03 | Kissell Kevin D. | Mechanisms for assuring quality of service for programs executing on a multithreaded processor |
US20050050305A1 (en) * | 2003-08-28 | 2005-03-03 | Kissell Kevin D. | Integrated mechanism for suspension and deallocation of computational threads of execution in a processor |
US20050055504A1 (en) * | 2002-10-08 | 2005-03-10 | Hass David T. | Advanced processor with system on a chip interconnect technology |
US20050120194A1 (en) * | 2003-08-28 | 2005-06-02 | Mips Technologies, Inc. | Apparatus, method, and instruction for initiation of concurrent instruction streams in a multithreading microprocessor |
US6920634B1 (en) * | 1998-08-03 | 2005-07-19 | International Business Machines Corporation | Detecting and causing unsafe latent accesses to a resource in multi-threaded programs |
US6922745B2 (en) * | 2002-05-02 | 2005-07-26 | Intel Corporation | Method and apparatus for handling locks |
US6925550B2 (en) * | 2002-01-02 | 2005-08-02 | Intel Corporation | Speculative scheduling of instructions with source operand validity bit and rescheduling upon carried over destination operand invalid bit detection |
US6986140B2 (en) * | 2000-02-17 | 2006-01-10 | International Business Machines Corporation | Method for determining idle processor load balancing in a multiple processors system |
US6993598B2 (en) * | 2003-10-09 | 2006-01-31 | International Business Machines Corporation | Method and apparatus for efficient sharing of DMA resource |
US7020879B1 (en) * | 1998-12-16 | 2006-03-28 | Mips Technologies, Inc. | Interrupt and exception handling for multi-streaming digital processors |
US7031992B2 (en) * | 2000-09-08 | 2006-04-18 | Quartics, Inc. | Hardware function generator support in a DSP |
US7065094B2 (en) * | 2000-07-05 | 2006-06-20 | Telefonaktiebolaget Lm Ericsson (Publ) | Method and device in a coupling node for a telecommunication system |
US7073042B2 (en) * | 2002-12-12 | 2006-07-04 | Intel Corporation | Reclaiming existing fields in address translation data structures to extend control over memory accesses |
US20060161421A1 (en) * | 2003-08-28 | 2006-07-20 | Mips Technologies, Inc. | Software emulation of directed exceptions in a multithreading processor |
US20060161921A1 (en) * | 2003-08-28 | 2006-07-20 | Mips Technologies, Inc. | Preemptive multitasking employing software emulation of directed exceptions in a multithreading processor |
US7093106B2 (en) * | 2003-04-23 | 2006-08-15 | International Business Machines Corporation | Register rename array with individual thread bits set upon allocation and cleared upon instruction completion |
US20060190945A1 (en) * | 2003-08-28 | 2006-08-24 | Mips Technologies, Inc. | Symmetric multiprocessor operating system for execution on non-independent lightweight thread context |
US20060195683A1 (en) * | 2003-08-28 | 2006-08-31 | Mips Technologies, Inc. | Symmetric multiprocessor operating system for execution on non-independent lightweight thread contexts |
US20060206686A1 (en) * | 2005-03-08 | 2006-09-14 | Mips Technologies, Inc. | Three-tiered translation lookaside buffer hierarchy in a multithreading microprocessor |
US7181600B1 (en) * | 2001-08-02 | 2007-02-20 | Mips Technologies, Inc. | Read-only access to CPO registers |
US7185185B2 (en) * | 1999-05-11 | 2007-02-27 | Sun Microsystems, Inc. | Multiple-thread processor with in-pipeline, thread selectable storage |
US7185183B1 (en) * | 2001-08-02 | 2007-02-27 | Mips Technologies, Inc. | Atomic update of CPO state |
US7216338B2 (en) * | 2002-02-20 | 2007-05-08 | Microsoft Corporation | Conformance execution of non-deterministic specifications for components |
US20070186028A2 (en) * | 2003-08-28 | 2007-08-09 | Mips Technologies, Inc. | Synchronized storage providing multiple synchronization semantics |
US7275246B1 (en) * | 1999-01-28 | 2007-09-25 | Ati International Srl | Executing programs for a first computer architecture on a computer of a second architecture |
US7386636B2 (en) * | 2005-08-19 | 2008-06-10 | International Business Machines Corporation | System and method for communicating command parameters between a processor and a memory flow controller |
US7657683B2 (en) * | 2008-02-01 | 2010-02-02 | Redpine Signals, Inc. | Cross-thread interrupt controller for a multi-thread processor |
US7665088B1 (en) * | 1998-05-15 | 2010-02-16 | Vmware, Inc. | Context-switching to and from a host OS in a virtualized computer system |
US7689867B2 (en) * | 2005-06-09 | 2010-03-30 | Intel Corporation | Multiprocessor breakpoint |
Family Cites Families (35)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5159686A (en) | 1988-02-29 | 1992-10-27 | Convex Computer Corporation | Multi-processor computer system having process-independent communication register addressing |
US5410710A (en) | 1990-12-21 | 1995-04-25 | Intel Corporation | Multiprocessor programmable interrupt controller system adapted to functional redundancy checking processor systems |
SE9404294D0 (en) | 1994-12-09 | 1994-12-09 | Ellemtel Utvecklings Ab | manner and device in telecommunications |
US6128720A (en) | 1994-12-29 | 2000-10-03 | International Business Machines Corporation | Distributed processing array with component processors performing customized interpretation of instructions |
US5812811A (en) | 1995-02-03 | 1998-09-22 | International Business Machines Corporation | Executing speculative parallel instructions threads with forking and inter-thread communication |
US5835748A (en) | 1995-12-19 | 1998-11-10 | Intel Corporation | Method for executing different sets of instructions that cause a processor to perform different data type operations on different physical registers files that logically appear to software as a single aliased register file |
US5706514A (en) | 1996-03-04 | 1998-01-06 | Compaq Computer Corporation | Distributed execution of mode mismatched commands in multiprocessor computer systems |
JP2882475B2 (en) | 1996-07-12 | 1999-04-12 | 日本電気株式会社 | Thread execution method |
US6647508B2 (en) | 1997-11-04 | 2003-11-11 | Hewlett-Packard Development Company, L.P. | Multiprocessor computer architecture with multiple operating system instances and software controlled resource allocation |
JP3209205B2 (en) | 1998-04-28 | 2001-09-17 | 日本電気株式会社 | Inherit device of register contents in processor |
US7111290B1 (en) | 1999-01-28 | 2006-09-19 | Ati International Srl | Profiling program execution to identify frequently-executed portions and to assist binary translation |
US6330656B1 (en) | 1999-03-31 | 2001-12-11 | International Business Machines Corporation | PCI slot control apparatus with dynamic configuration for partitioned systems |
EP1181648A1 (en) | 1999-04-09 | 2002-02-27 | Clearspeed Technology Limited | Parallel data processing apparatus |
US6986137B1 (en) | 1999-09-28 | 2006-01-10 | International Business Machines Corporation | Method, system and program products for managing logical processors of a computing environment |
US6889319B1 (en) | 1999-12-09 | 2005-05-03 | Intel Corporation | Method and apparatus for entering and exiting multiple threads within a multithreaded processor |
US6671795B1 (en) | 2000-01-21 | 2003-12-30 | Intel Corporation | Method and apparatus for pausing execution in a processor or the like |
US20010052053A1 (en) | 2000-02-08 | 2001-12-13 | Mario Nemirovsky | Stream processing unit for a multi-streaming processor |
US6957432B2 (en) | 2000-03-21 | 2005-10-18 | Microsoft Corporation | Real-time scheduler |
US20010034751A1 (en) | 2000-04-21 | 2001-10-25 | Shinichiro Eto | Real-time OS simulator |
US6668308B2 (en) | 2000-06-10 | 2003-12-23 | Hewlett-Packard Development Company, L.P. | Scalable architecture based on single-chip multiprocessing |
US6480845B1 (en) | 2000-06-14 | 2002-11-12 | Bull Hn Information Systems Inc. | Method and data processing system for emulating virtual memory working spaces |
US6643759B2 (en) | 2001-03-30 | 2003-11-04 | Mips Technologies, Inc. | Mechanism to extend computer memory protection schemes |
US6671791B1 (en) | 2001-06-15 | 2003-12-30 | Advanced Micro Devices, Inc. | Processor including a translation unit for selectively translating virtual addresses of different sizes using a plurality of paging tables and mapping mechanisms |
JP3630118B2 (en) | 2001-07-12 | 2005-03-16 | 日本電気株式会社 | Thread termination method and apparatus, and parallel processor system |
US7428485B2 (en) | 2001-08-24 | 2008-09-23 | International Business Machines Corporation | System for yielding to a processor |
US6877083B2 (en) | 2001-10-16 | 2005-04-05 | International Business Machines Corporation | Address mapping mechanism for behavioral memory enablement within a data processing system |
US7127561B2 (en) | 2001-12-31 | 2006-10-24 | Intel Corporation | Coherency techniques for suspending execution of a thread until a specified memory access occurs |
US20030225816A1 (en) | 2002-06-03 | 2003-12-04 | Morrow Michael W. | Architecture to support multiple concurrent threads of execution on an arm-compatible processor |
US20050033889A1 (en) | 2002-10-08 | 2005-02-10 | Hass David T. | Advanced processor with interrupt delivery mechanism for multi-threaded multi-CPU system on a chip |
US7152170B2 (en) | 2003-02-20 | 2006-12-19 | Samsung Electronics Co., Ltd. | Simultaneous multi-threading processor circuits and computer program products configured to operate at different performance levels based on a number of operating threads and methods of operating |
ES2315469T3 (en) | 2003-04-09 | 2009-04-01 | Virtuallogix Sa | OPERATING SYSTEMS. |
WO2005022384A1 (en) | 2003-08-28 | 2005-03-10 | Mips Technologies, Inc. | Apparatus, method, and instruction for initiation of concurrent instruction streams in a multithreading microprocessor |
US7594089B2 (en) | 2003-08-28 | 2009-09-22 | Mips Technologies, Inc. | Smart memory based synchronization controller for a multi-threaded multiprocessor SoC |
US7600135B2 (en) | 2005-04-14 | 2009-10-06 | Mips Technologies, Inc. | Apparatus and method for software specified power management performance using low power virtual threads |
US7627770B2 (en) | 2005-04-14 | 2009-12-01 | Mips Technologies, Inc. | Apparatus and method for automatic low power mode invocation in a multi-threaded processor |
-
2006
- 2006-01-11 US US11/330,916 patent/US7870553B2/en active Active
Patent Citations (99)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3665404A (en) * | 1970-04-09 | 1972-05-23 | Burroughs Corp | Multi-processor processing system having interprocessor interrupt apparatus |
US4860190A (en) * | 1985-09-03 | 1989-08-22 | Fujitsu Limited | Computer system for controlling virtual machines |
US4817051A (en) * | 1987-07-02 | 1989-03-28 | Fairchild Semiconductor Corporation | Expandable multi-port random access memory |
US4843541A (en) * | 1987-07-29 | 1989-06-27 | International Business Machines Corporation | Logical resource partitioning of a data processing system |
US5428754A (en) * | 1988-03-23 | 1995-06-27 | 3Dlabs Ltd | Computer system with clock shared between processors executing separate instruction streams |
US5499349A (en) * | 1989-05-26 | 1996-03-12 | Massachusetts Institute Of Technology | Pipelined processor with fork, join, and start instructions using tokens to indicate the next instruction for each of multiple threads of execution |
US5295265A (en) * | 1991-06-04 | 1994-03-15 | Sextant Avionique | Device for enhancing the performance of a real time executive kernel associated with a multiprocessor structure that can include a large number of processors |
US5542076A (en) * | 1991-06-14 | 1996-07-30 | Digital Equipment Corporation | Method and apparatus for adaptive interrupt servicing in data processing system |
US5511192A (en) * | 1991-11-30 | 1996-04-23 | Kabushiki Kaisha Toshiba | Method and apparatus for managing thread private data in a parallel processing computer |
US5515538A (en) * | 1992-05-29 | 1996-05-07 | Sun Microsystems, Inc. | Apparatus and method for interrupt handling in a multi-threaded operating system kernel |
US5659786A (en) * | 1992-10-19 | 1997-08-19 | International Business Machines Corporation | System and method for dynamically performing resource reconfiguration in a logically partitioned data processing system |
US5758142A (en) * | 1994-05-31 | 1998-05-26 | Digital Equipment Corporation | Trainable apparatus for predicting instruction outcomes in pipelined processors |
US5606696A (en) * | 1994-09-09 | 1997-02-25 | International Business Machines Corporation | Exception handling method and apparatus for a microkernel data processing system |
US5742822A (en) * | 1994-12-19 | 1998-04-21 | Nec Corporation | Multithreaded processor which dynamically discriminates a parallel execution and a sequential execution of threads |
US5867704A (en) * | 1995-02-24 | 1999-02-02 | Matsushita Electric Industrial Co., Ltd. | Multiprocessor system shaving processor based idle state detection and method of executing tasks in such a multiprocessor system |
US5727203A (en) * | 1995-03-31 | 1998-03-10 | Sun Microsystems, Inc. | Methods and apparatus for managing a database in a distributed object operating environment using persistent and transient cache |
US5799188A (en) * | 1995-12-15 | 1998-08-25 | International Business Machines Corporation | System and method for managing variable weight thread contexts in a multithreaded computer system |
US5892934A (en) * | 1996-04-02 | 1999-04-06 | Advanced Micro Devices, Inc. | Microprocessor configured to detect a branch to a DSP routine and to direct a DSP to execute said routine |
US5790871A (en) * | 1996-05-17 | 1998-08-04 | Advanced Micro Devices | System and method for testing and debugging a multiprocessing interrupt controller |
US5944816A (en) * | 1996-05-17 | 1999-08-31 | Advanced Micro Devices, Inc. | Microprocessor configured to execute multiple threads including interrupt service routines |
US5933627A (en) * | 1996-07-01 | 1999-08-03 | Sun Microsystems | Thread switch on blocked load or store using instruction thread field |
US5949994A (en) * | 1997-02-12 | 1999-09-07 | The Dow Chemical Company | Dedicated context-cycling computer with timed context |
US6175916B1 (en) * | 1997-05-06 | 2001-01-16 | Microsoft Corporation | Common-thread inter-process function calls invoked by jumps to invalid addresses |
US6240531B1 (en) * | 1997-09-30 | 2001-05-29 | Networks Associates Inc. | System and method for computer operating system protection |
US6697935B1 (en) * | 1997-10-23 | 2004-02-24 | International Business Machines Corporation | Method and apparatus for selecting thread switch events in a multithreaded processor |
US6061710A (en) * | 1997-10-29 | 2000-05-09 | International Business Machines Corporation | Multithreaded processor incorporating a thread latch register for interrupt service new pending threads |
US6088787A (en) * | 1998-03-30 | 2000-07-11 | Celestica International Inc. | Enhanced program counter stack for multi-tasking central processing unit |
US6560626B1 (en) * | 1998-04-02 | 2003-05-06 | Microsoft Corporation | Thread interruption with minimal resource usage using an asynchronous procedure call |
US7665088B1 (en) * | 1998-05-15 | 2010-02-16 | Vmware, Inc. | Context-switching to and from a host OS in a virtualized computer system |
US6189093B1 (en) * | 1998-07-21 | 2001-02-13 | Lsi Logic Corporation | System for initiating exception routine in response to memory access exception by storing exception information and exception bit within architectured register |
US6253306B1 (en) * | 1998-07-29 | 2001-06-26 | Advanced Micro Devices, Inc. | Prefetch instruction mechanism for processor |
US6920634B1 (en) * | 1998-08-03 | 2005-07-19 | International Business Machines Corporation | Detecting and causing unsafe latent accesses to a resource in multi-threaded programs |
US6223228B1 (en) * | 1998-09-17 | 2001-04-24 | Bull Hn Information Systems Inc. | Apparatus for synchronizing multiple processors in a data processing system |
US6205414B1 (en) * | 1998-10-02 | 2001-03-20 | International Business Machines Corporation | Methodology for emulation of multi-threaded processes in a single-threaded operating system |
US6286027B1 (en) * | 1998-11-30 | 2001-09-04 | Lucent Technologies Inc. | Two step thread creation with register renaming |
US6205543B1 (en) * | 1998-12-03 | 2001-03-20 | Sun Microsystems, Inc. | Efficient handling of a large register file for context switching |
US7020879B1 (en) * | 1998-12-16 | 2006-03-28 | Mips Technologies, Inc. | Interrupt and exception handling for multi-streaming digital processors |
US6401155B1 (en) * | 1998-12-22 | 2002-06-04 | Philips Electronics North America Corporation | Interrupt/software-controlled thread processing |
US7275246B1 (en) * | 1999-01-28 | 2007-09-25 | Ati International Srl | Executing programs for a first computer architecture on a computer of a second architecture |
US6687812B1 (en) * | 1999-04-20 | 2004-02-03 | Nec Corporation | Parallel processing apparatus |
US7185185B2 (en) * | 1999-05-11 | 2007-02-27 | Sun Microsystems, Inc. | Multiple-thread processor with in-pipeline, thread selectable storage |
US6675192B2 (en) * | 1999-10-01 | 2004-01-06 | Hewlett-Packard Development Company, L.P. | Temporary halting of thread execution until monitoring of armed events to memory location identified in working registers |
US6738796B1 (en) * | 1999-10-08 | 2004-05-18 | Globespanvirata, Inc. | Optimization of memory requirements for multi-threaded operating systems |
US20020083173A1 (en) * | 2000-02-08 | 2002-06-27 | Enrique Musoll | Method and apparatus for optimizing selection of available contexts for packet processing in multi-stream packet processing |
US6986140B2 (en) * | 2000-02-17 | 2006-01-10 | International Business Machines Corporation | Method for determining idle processor load balancing in a multiple processors system |
US20020016869A1 (en) * | 2000-06-22 | 2002-02-07 | Guillaume Comeau | Data path engine |
US6591379B1 (en) * | 2000-06-23 | 2003-07-08 | Microsoft Corporation | Method and system for injecting an exception to recover unsaved data |
US7065094B2 (en) * | 2000-07-05 | 2006-06-20 | Telefonaktiebolaget Lm Ericsson (Publ) | Method and device in a coupling node for a telecommunication system |
US7031992B2 (en) * | 2000-09-08 | 2006-04-18 | Quartics, Inc. | Hardware function generator support in a DSP |
US20020083278A1 (en) * | 2000-12-22 | 2002-06-27 | Bull Hn Information Systems Inc. | Method and data processing system for performing atomic multiple word writes |
US20020091915A1 (en) * | 2001-01-11 | 2002-07-11 | Parady Bodo K. | Load prediction and thread identification in a multithreaded microprocessor |
US20020103847A1 (en) * | 2001-02-01 | 2002-08-01 | Hanan Potash | Efficient mechanism for inter-thread communication within a multi-threaded computer system |
US20030028755A1 (en) * | 2001-07-12 | 2003-02-06 | Nec Corporation | Interprocessor register succession method and device therefor |
US20030014471A1 (en) * | 2001-07-12 | 2003-01-16 | Nec Corporation | Multi-thread execution method and parallel processor system |
US20030018684A1 (en) * | 2001-07-18 | 2003-01-23 | Nec Corporation | Multi-thread execution method and parallel processor system |
US7185183B1 (en) * | 2001-08-02 | 2007-02-27 | Mips Technologies, Inc. | Atomic update of CPO state |
US7181600B1 (en) * | 2001-08-02 | 2007-02-20 | Mips Technologies, Inc. | Read-only access to CPO registers |
US6779065B2 (en) * | 2001-08-31 | 2004-08-17 | Intel Corporation | Mechanism for interrupt handling in computer systems that support concurrent execution of multiple threads |
US20030074545A1 (en) * | 2001-10-12 | 2003-04-17 | Uhler G. Michael | Method and apparatus for binding shadow registers to vectored interrupts |
US20030079094A1 (en) * | 2001-10-19 | 2003-04-24 | Ravi Rajwar | Concurrent execution of critical sections by eliding ownership of locks |
US20030093652A1 (en) * | 2001-11-14 | 2003-05-15 | Song Seungyoon Peter | Operand file using pointers and reference counters and a method of use |
US20030105796A1 (en) * | 2001-12-05 | 2003-06-05 | Sandri Jason G. | Method and apparatus for controlling access to shared resources in an environment with multiple logical processors |
US20030115245A1 (en) * | 2001-12-17 | 2003-06-19 | Kunimasa Fujisawa | Multi-application execution system and method thereof |
US20030126416A1 (en) * | 2001-12-31 | 2003-07-03 | Marr Deborah T. | Suspending execution of a thread in a multi-threaded processor |
US6925550B2 (en) * | 2002-01-02 | 2005-08-02 | Intel Corporation | Speculative scheduling of instructions with source operand validity bit and rescheduling upon carried over destination operand invalid bit detection |
US7216338B2 (en) * | 2002-02-20 | 2007-05-08 | Microsoft Corporation | Conformance execution of non-deterministic specifications for components |
US6922745B2 (en) * | 2002-05-02 | 2005-07-26 | Intel Corporation | Method and apparatus for handling locks |
US20040015684A1 (en) * | 2002-05-30 | 2004-01-22 | International Business Machines Corporation | Method, apparatus and computer program product for scheduling multiple threads for a processor |
US20050055504A1 (en) * | 2002-10-08 | 2005-03-10 | Hass David T. | Advanced processor with system on a chip interconnect technology |
US20040073910A1 (en) * | 2002-10-15 | 2004-04-15 | Erdem Hokenek | Method and apparatus for high speed cross-thread interrupts in a multithreaded processor |
US7073042B2 (en) * | 2002-12-12 | 2006-07-04 | Intel Corporation | Reclaiming existing fields in address translation data structures to extend control over memory accesses |
US7203823B2 (en) * | 2003-01-09 | 2007-04-10 | Sony Corporation | Partial and start-over threads in embedded real-time kernel |
US20040139306A1 (en) * | 2003-01-09 | 2004-07-15 | Sony Corporation | Partial and start-over threads in embedded real-time kernel |
US7093106B2 (en) * | 2003-04-23 | 2006-08-15 | International Business Machines Corporation | Register rename array with individual thread bits set upon allocation and cleared upon instruction completion |
US20060161421A1 (en) * | 2003-08-28 | 2006-07-20 | Mips Technologies, Inc. | Software emulation of directed exceptions in a multithreading processor |
US20050125629A1 (en) * | 2003-08-28 | 2005-06-09 | Mips Technologies, Inc. | Mechanisms for dynamic configuration of virtual processor resources |
US20060195683A1 (en) * | 2003-08-28 | 2006-08-31 | Mips Technologies, Inc. | Symmetric multiprocessor operating system for execution on non-independent lightweight thread contexts |
US7676660B2 (en) * | 2003-08-28 | 2010-03-09 | Mips Technologies, Inc. | System, method, and computer program product for conditionally suspending issuing instructions of a thread |
US20060161921A1 (en) * | 2003-08-28 | 2006-07-20 | Mips Technologies, Inc. | Preemptive multitasking employing software emulation of directed exceptions in a multithreading processor |
US20070043935A2 (en) * | 2003-08-28 | 2007-02-22 | Mips Technologies, Inc. | Symmetric multiprocessor operating system for execution on non-independent lightweight thread contexts |
US20070044105A2 (en) * | 2003-08-28 | 2007-02-22 | Mips Technologies, Inc. | Symmetric multiprocessor operating system for execution on non-independent lightweight thread contexts |
US20060190945A1 (en) * | 2003-08-28 | 2006-08-24 | Mips Technologies, Inc. | Symmetric multiprocessor operating system for execution on non-independent lightweight thread context |
US20050125795A1 (en) * | 2003-08-28 | 2005-06-09 | Mips Technologies, Inc. | Integrated mechanism for suspension and deallocation of computational threads of execution in a processor |
US20080140998A1 (en) * | 2003-08-28 | 2008-06-12 | Mips Technologies, Inc. | Integrated mechanism for suspension and deallocation of computational threads of execution in a processor |
US20050120194A1 (en) * | 2003-08-28 | 2005-06-02 | Mips Technologies, Inc. | Apparatus, method, and instruction for initiation of concurrent instruction streams in a multithreading microprocessor |
US20070106989A1 (en) * | 2003-08-28 | 2007-05-10 | Mips Technologies, Inc. | Symmetric multiprocessor operating system for execution on non-independent lightweight thread contexts |
US20070106990A1 (en) * | 2003-08-28 | 2007-05-10 | Mips Technologies, Inc. | Symmetric multiprocessor operating system for execution on non-independent lightweight thread contexts |
US20070106988A1 (en) * | 2003-08-28 | 2007-05-10 | Mips Technologies, Inc. | Symmetric multiprocessor operating system for execution on non-independent lightweight thread contexts |
US20070106887A1 (en) * | 2003-08-28 | 2007-05-10 | Mips Technologies, Inc. | Symmetric multiprocessor operating system for execution on non-independent lightweight thread contexts |
US20070186028A2 (en) * | 2003-08-28 | 2007-08-09 | Mips Technologies, Inc. | Synchronized storage providing multiple synchronization semantics |
US20050050305A1 (en) * | 2003-08-28 | 2005-03-03 | Kissell Kevin D. | Integrated mechanism for suspension and deallocation of computational threads of execution in a processor |
US7321965B2 (en) * | 2003-08-28 | 2008-01-22 | Mips Technologies, Inc. | Integrated mechanism for suspension and deallocation of computational threads of execution in a processor |
US7376954B2 (en) * | 2003-08-28 | 2008-05-20 | Mips Technologies, Inc. | Mechanisms for assuring quality of service for programs executing on a multithreaded processor |
US20050050395A1 (en) * | 2003-08-28 | 2005-03-03 | Kissell Kevin D. | Mechanisms for assuring quality of service for programs executing on a multithreaded processor |
US6993598B2 (en) * | 2003-10-09 | 2006-01-31 | International Business Machines Corporation | Method and apparatus for efficient sharing of DMA resource |
US20060206686A1 (en) * | 2005-03-08 | 2006-09-14 | Mips Technologies, Inc. | Three-tiered translation lookaside buffer hierarchy in a multithreading microprocessor |
US7689867B2 (en) * | 2005-06-09 | 2010-03-30 | Intel Corporation | Multiprocessor breakpoint |
US7386636B2 (en) * | 2005-08-19 | 2008-06-10 | International Business Machines Corporation | System and method for communicating command parameters between a processor and a memory flow controller |
US7657683B2 (en) * | 2008-02-01 | 2010-02-02 | Redpine Signals, Inc. | Cross-thread interrupt controller for a multi-thread processor |
Cited By (41)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7711931B2 (en) | 2003-08-28 | 2010-05-04 | Mips Technologies, Inc. | Synchronized storage providing multiple synchronization semantics |
US20060190945A1 (en) * | 2003-08-28 | 2006-08-24 | Mips Technologies, Inc. | Symmetric multiprocessor operating system for execution on non-independent lightweight thread context |
US20050125629A1 (en) * | 2003-08-28 | 2005-06-09 | Mips Technologies, Inc. | Mechanisms for dynamic configuration of virtual processor resources |
US20050050305A1 (en) * | 2003-08-28 | 2005-03-03 | Kissell Kevin D. | Integrated mechanism for suspension and deallocation of computational threads of execution in a processor |
US20050251613A1 (en) * | 2003-08-28 | 2005-11-10 | Mips Technologies, Inc., A Delaware Corporation | Synchronized storage providing multiple synchronization semantics |
US7725689B2 (en) | 2003-08-28 | 2010-05-25 | Mips Technologies, Inc. | Symmetric multiprocessor operating system for execution on non-independent lightweight thread contexts |
US20060161921A1 (en) * | 2003-08-28 | 2006-07-20 | Mips Technologies, Inc. | Preemptive multitasking employing software emulation of directed exceptions in a multithreading processor |
US7725697B2 (en) | 2003-08-28 | 2010-05-25 | Mips Technologies, Inc. | Symmetric multiprocessor operating system for execution on non-independent lightweight thread contexts |
US20070044105A2 (en) * | 2003-08-28 | 2007-02-22 | Mips Technologies, Inc. | Symmetric multiprocessor operating system for execution on non-independent lightweight thread contexts |
US20070106887A1 (en) * | 2003-08-28 | 2007-05-10 | Mips Technologies, Inc. | Symmetric multiprocessor operating system for execution on non-independent lightweight thread contexts |
US20070106988A1 (en) * | 2003-08-28 | 2007-05-10 | Mips Technologies, Inc. | Symmetric multiprocessor operating system for execution on non-independent lightweight thread contexts |
US20070106990A1 (en) * | 2003-08-28 | 2007-05-10 | Mips Technologies, Inc. | Symmetric multiprocessor operating system for execution on non-independent lightweight thread contexts |
US20070186028A2 (en) * | 2003-08-28 | 2007-08-09 | Mips Technologies, Inc. | Synchronized storage providing multiple synchronization semantics |
US20080140998A1 (en) * | 2003-08-28 | 2008-06-12 | Mips Technologies, Inc. | Integrated mechanism for suspension and deallocation of computational threads of execution in a processor |
US9032404B2 (en) | 2003-08-28 | 2015-05-12 | Mips Technologies, Inc. | Preemptive multitasking employing software emulation of directed exceptions in a multithreading processor |
US7594089B2 (en) | 2003-08-28 | 2009-09-22 | Mips Technologies, Inc. | Smart memory based synchronization controller for a multi-threaded multiprocessor SoC |
US7610473B2 (en) | 2003-08-28 | 2009-10-27 | Mips Technologies, Inc. | Apparatus, method, and instruction for initiation of concurrent instruction streams in a multithreading microprocessor |
US7676660B2 (en) | 2003-08-28 | 2010-03-09 | Mips Technologies, Inc. | System, method, and computer program product for conditionally suspending issuing instructions of a thread |
US7676664B2 (en) | 2003-08-28 | 2010-03-09 | Mips Technologies, Inc. | Symmetric multiprocessor operating system for execution on non-independent lightweight thread contexts |
US7694304B2 (en) | 2003-08-28 | 2010-04-06 | Mips Technologies, Inc. | Mechanisms for dynamic configuration of virtual processor resources |
US20050251639A1 (en) * | 2003-08-28 | 2005-11-10 | Mips Technologies, Inc. A Delaware Corporation | Smart memory based synchronization controller for a multi-threaded multiprocessor SoC |
US20050120194A1 (en) * | 2003-08-28 | 2005-06-02 | Mips Technologies, Inc. | Apparatus, method, and instruction for initiation of concurrent instruction streams in a multithreading microprocessor |
US20060161421A1 (en) * | 2003-08-28 | 2006-07-20 | Mips Technologies, Inc. | Software emulation of directed exceptions in a multithreading processor |
US7730291B2 (en) | 2003-08-28 | 2010-06-01 | Mips Technologies, Inc. | Symmetric multiprocessor operating system for execution on non-independent lightweight thread contexts |
US7836450B2 (en) | 2003-08-28 | 2010-11-16 | Mips Technologies, Inc. | Symmetric multiprocessor operating system for execution on non-independent lightweight thread contexts |
US7849297B2 (en) | 2003-08-28 | 2010-12-07 | Mips Technologies, Inc. | Software emulation of directed exceptions in a multithreading processor |
US20110040956A1 (en) * | 2003-08-28 | 2011-02-17 | Mips Technologies, Inc. | Symmetric Multiprocessor Operating System for Execution On Non-Independent Lightweight Thread Contexts |
US8266620B2 (en) | 2003-08-28 | 2012-09-11 | Mips Technologies, Inc. | Symmetric multiprocessor operating system for execution on non-independent lightweight thread contexts |
US8145884B2 (en) | 2003-08-28 | 2012-03-27 | Mips Technologies, Inc. | Apparatus, method and instruction for initiation of concurrent instruction streams in a multithreading microprocessor |
US8069354B2 (en) | 2007-08-14 | 2011-11-29 | Mips Technologies, Inc. | Power management for system having one or more integrated circuits |
US8214628B2 (en) * | 2008-03-14 | 2012-07-03 | Fujitsu Limited | Computer performance monitoring by associating counter values with particular processes when an interrupt is detected |
US20090235056A1 (en) * | 2008-03-14 | 2009-09-17 | Fujitsu Limited | Recording medium storing performance monitoring program, performance monitoring method, and performance monitoring device |
US20110113220A1 (en) * | 2008-06-19 | 2011-05-12 | Hiroyuki Morishita | Multiprocessor |
US8433884B2 (en) | 2008-06-19 | 2013-04-30 | Panasonic Corporation | Multiprocessor |
US20120008674A1 (en) * | 2009-02-17 | 2012-01-12 | Panasonic Corporation | Multithread processor and digital television system |
WO2014031495A3 (en) * | 2012-08-18 | 2014-07-17 | Qualcomm Technologies, Inc. | Translation look-aside buffer with prefetching |
US9141556B2 (en) | 2012-08-18 | 2015-09-22 | Qualcomm Technologies, Inc. | System translation look-aside buffer with request-based allocation and prefetching |
US9396130B2 (en) | 2012-08-18 | 2016-07-19 | Qualcomm Technologies, Inc. | System translation look-aside buffer integrated in an interconnect |
US9465749B2 (en) | 2012-08-18 | 2016-10-11 | Qualcomm Technologies, Inc. | DMA engine with STLB prefetch capabilities and tethered prefetching |
US9852081B2 (en) | 2012-08-18 | 2017-12-26 | Qualcomm Incorporated | STLB prefetching for a multi-dimension engine |
US10951475B2 (en) * | 2019-06-28 | 2021-03-16 | Intel Corporation | Technologies for transmit scheduler dynamic configurations |
Also Published As
Publication number | Publication date |
---|---|
US7870553B2 (en) | 2011-01-11 |
US20060190946A1 (en) | 2006-08-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7836450B2 (en) | Symmetric multiprocessor operating system for execution on non-independent lightweight thread contexts | |
US7418585B2 (en) | Symmetric multiprocessor operating system for execution on non-independent lightweight thread contexts | |
US7870553B2 (en) | Symmetric multiprocessor operating system for execution on non-independent lightweight thread contexts | |
US9032404B2 (en) | Preemptive multitasking employing software emulation of directed exceptions in a multithreading processor | |
US7849297B2 (en) | Software emulation of directed exceptions in a multithreading processor | |
US7424599B2 (en) | Apparatus, method, and instruction for software management of multiple computational contexts in a multithreaded microprocessor | |
EP1570352B1 (en) | Method and apparatus for switching between processes | |
US10061588B2 (en) | Tracking operand liveness information in a computer system and performing function based on the liveness information | |
CA2508044C (en) | Cross partition sharing of state information | |
US7376954B2 (en) | Mechanisms for assuring quality of service for programs executing on a multithreaded processor | |
US20050050305A1 (en) | Integrated mechanism for suspension and deallocation of computational threads of execution in a processor | |
EP1570353A2 (en) | Enhanced processor virtualization mechanism via saving and restoring soft processor/system states | |
WO2005022384A1 (en) | Apparatus, method, and instruction for initiation of concurrent instruction streams in a multithreading microprocessor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: MIPS TECHNOLOGIES, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:S.A.R.L., PARALOGOS;KISSELL, KEVIN D.;REEL/FRAME:017572/0416 Effective date: 20060207 |
|
AS | Assignment |
Owner name: JEFFERIES FINANCE LLC, AS COLLATERAL AGENT, NEW YO Free format text: SECURITY AGREEMENT;ASSIGNOR:MIPS TECHNOLOGIES, INC.;REEL/FRAME:019744/0001 Effective date: 20070824 Owner name: JEFFERIES FINANCE LLC, AS COLLATERAL AGENT,NEW YOR Free format text: SECURITY AGREEMENT;ASSIGNOR:MIPS TECHNOLOGIES, INC.;REEL/FRAME:019744/0001 Effective date: 20070824 |
|
AS | Assignment |
Owner name: MIPS TECHNOLOGIES, INC., CALIFORNIA Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:JEFFERIES FINANCE LLC, AS COLLATERAL AGENT;REEL/FRAME:021985/0015 Effective date: 20081205 Owner name: MIPS TECHNOLOGIES, INC.,CALIFORNIA Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:JEFFERIES FINANCE LLC, AS COLLATERAL AGENT;REEL/FRAME:021985/0015 Effective date: 20081205 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FEPP | Fee payment procedure |
Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
AS | Assignment |
Owner name: IMAGINATION TECHNOLOGIES, LLC, CALIFORNIA Free format text: CHANGE OF NAME;ASSIGNOR:MIPS TECHNOLOGIES, INC.;REEL/FRAME:042375/0221 Effective date: 20140310 |
|
AS | Assignment |
Owner name: MIPS TECH LIMITED, CALIFORNIA Free format text: CHANGE OF NAME;ASSIGNOR:HELLOSOFT LIMITED;REEL/FRAME:045146/0514 Effective date: 20171108 Owner name: MIPS TECH LIMITED, CALIFORNIA Free format text: CHANGE OF NAME;ASSIGNOR:IMAGINATION TECHNOLOGIES, LLC;REEL/FRAME:045536/0408 Effective date: 20171107 |
|
AS | Assignment |
Owner name: MIPS TECH, LLC, CALIFORNIA Free format text: CHANGE OF NAME;ASSIGNOR:IMAGINATION TECHNOLOGIES, LLC;REEL/FRAME:046249/0128 Effective date: 20171107 |
|
FEPP | Fee payment procedure |
Free format text: 7.5 YR SURCHARGE - LATE PMT W/IN 6 MO, LARGE ENTITY (ORIGINAL EVENT CODE: M1555) |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552) Year of fee payment: 8 |
|
AS | Assignment |
Owner name: MIPS TECH, LLC, CALIFORNIA Free format text: CHANGE OF NAME;ASSIGNOR:MIPS TECH LIMITED;REEL/FRAME:046749/0515 Effective date: 20171107 |
|
AS | Assignment |
Owner name: MIPS TECH, LLC, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MIPS TECH LIMITED;REEL/FRAME:046857/0640 Effective date: 20180216 |
|
AS | Assignment |
Owner name: WAVE COMPUTING LIQUIDATING TRUST, CALIFORNIA Free format text: SECURITY INTEREST;ASSIGNORS:WAVE COMPUTING, INC.;MIPS TECH, LLC;MIPS TECH, INC.;AND OTHERS;REEL/FRAME:055429/0532 Effective date: 20210226 |
|
AS | Assignment |
Owner name: CAPITAL FINANCE ADMINISTRATION, LLC, ILLINOIS Free format text: SECURITY INTEREST;ASSIGNORS:MIPS TECH, LLC;WAVE COMPUTING, INC.;REEL/FRAME:056558/0903 Effective date: 20210611 Owner name: MIPS TECH, INC., CALIFORNIA Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:WAVE COMPUTING LIQUIDATING TRUST;REEL/FRAME:056589/0606 Effective date: 20210611 Owner name: HELLOSOFT, INC., CALIFORNIA Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:WAVE COMPUTING LIQUIDATING TRUST;REEL/FRAME:056589/0606 Effective date: 20210611 Owner name: WAVE COMPUTING (UK) LIMITED, CALIFORNIA Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:WAVE COMPUTING LIQUIDATING TRUST;REEL/FRAME:056589/0606 Effective date: 20210611 Owner name: IMAGINATION TECHNOLOGIES, INC., CALIFORNIA Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:WAVE COMPUTING LIQUIDATING TRUST;REEL/FRAME:056589/0606 Effective date: 20210611 Owner name: CAUSTIC GRAPHICS, INC., CALIFORNIA Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:WAVE COMPUTING LIQUIDATING TRUST;REEL/FRAME:056589/0606 Effective date: 20210611 Owner name: MIPS TECH, LLC, CALIFORNIA Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:WAVE COMPUTING LIQUIDATING TRUST;REEL/FRAME:056589/0606 Effective date: 20210611 Owner name: WAVE COMPUTING, INC., CALIFORNIA Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:WAVE COMPUTING LIQUIDATING TRUST;REEL/FRAME:056589/0606 Effective date: 20210611 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 12TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1553); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 12 |
|
AS | Assignment |
Owner name: WAVE COMPUTING INC., CALIFORNIA Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CAPITAL FINANCE ADMINISTRATION, LLC, AS ADMINISTRATIVE AGENT;REEL/FRAME:062251/0251 Effective date: 20221229 Owner name: MIPS TECH, LLC, CALIFORNIA Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CAPITAL FINANCE ADMINISTRATION, LLC, AS ADMINISTRATIVE AGENT;REEL/FRAME:062251/0251 Effective date: 20221229 |