US20110320781A1 - Dynamic data synchronization in thread-level speculation - Google Patents
Dynamic data synchronization in thread-level speculation Download PDFInfo
- Publication number
- US20110320781A1 US20110320781A1 US12/826,287 US82628710A US2011320781A1 US 20110320781 A1 US20110320781 A1 US 20110320781A1 US 82628710 A US82628710 A US 82628710A US 2011320781 A1 US2011320781 A1 US 2011320781A1
- Authority
- US
- United States
- Prior art keywords
- synchronization
- processor
- dependence
- instructions
- bits
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000000034 method Methods 0.000 claims description 11
- 230000015654 memory Effects 0.000 description 15
- 238000010586 diagram Methods 0.000 description 8
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 235000009854 Cucurbita moschata Nutrition 0.000 description 1
- 240000001980 Cucurbita pepo Species 0.000 description 1
- 235000009852 Cucurbita pepo Nutrition 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 235000020354 squash Nutrition 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3824—Operand accessing
- G06F9/3834—Maintaining memory consistency
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/3004—Arrangements for executing specific machine instructions to perform operations on memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30076—Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
- G06F9/30087—Synchronisation or serialisation instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3851—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/52—Program synchronisation; Mutual exclusion, e.g. by means of semaphores
Definitions
- Thread-level speculation is a promising technique to parallelize sequential programs with static or dynamic compilers and hardware to recover if mis-speculation happens. Without proper synchronization, however, between dependent load and store instructions, for example, loads may execute before stores and cause data violations that squash the speculative threads and require re-execution with re-loaded data.
- FIG. 1 is a block diagram of an example system in accordance with one embodiment of the present invention.
- FIG. 2 is a block diagram of an example speculation engine in accordance with an embodiment of the present invention.
- FIGS. 3A and 3B are block diagrams of example software code in accordance with an embodiment of the present invention.
- FIG. 4 is a flow chart for dynamic data synchronization in thread-level speculation in accordance with an embodiment of the present invention.
- FIG. 5 is a block diagram of a system in accordance with an embodiment of the present invention.
- a processor is introduced with a speculative cache with synchronization bits that, when set, can stall a read of the cache line or word.
- processor instructions to set and clear the synchronization bits. Compilers may take advantage of these instructions to synchronize data dependencies.
- the present invention is intended to be practiced in processors and systems that may include additional parallelization and/or thread speculation features.
- system 100 may include processor 102 and memory 104 , such as dynamic random access memory (DRAM).
- processor 102 may include cores 106 - 110 , speculative cache 112 and speculation engine 118 .
- Cores 106 - 110 may be able to execute instructions independently from one another and may include any type of architecture. While shown as including three cores, processor 102 may have any number of cores and may include other components or controllers, not shown.
- processor 102 is a system on a chip (SOC).
- Speculative cache 112 may include any number of separate caches and may contain any number of entries. While intended as a low latency level one cache, speculative cache 112 may be implemented in any memory technology at any hierarchical level. Speculative cache 112 includes synchronization bit 114 associated with cache line or word 116 . When synchronization bit 114 is set, as described in greater detail hereinafter, line or word 116 would not be able to be loaded by a core, because, for example, another core may be about to perform a store upon which the load depends. In one embodiment, a core trying to load from cache line or word 116 when synchronization bit 114 is set would stall until synchronization bit 114 is cleared.
- Speculation engine 118 may implement a method for dynamic data synchronization in thread-level speculation, for example as described in reference to FIG. 4 , and may have an architecture as described in reference to FIG. 2 .
- Speculation engine 118 may be separate from processor 102 and may be implemented in hardware, software or a combination of hardware and software.
- speculation engine 118 may include parallelize services 202 , parallel output code 204 and serial input code 206 .
- Parallelize services 202 may provide speculation engine 118 with the ability to parallelize serial instructions and add dynamic data synchronization in thread-level speculation.
- Parallelize services 202 may include thread services 208 , synchronization set services 210 , and synchronization clear services 212 which may create parallel threads from serial instructions, insert processor instructions to set synchronization bits before dependence sources, and insert processor instructions to clear synchronization bits after dependence sources, respectively.
- Parallelize services 202 may create parallel output code 204 (for example as shown in FIG. 3B ) from serial input code 206 (for example as shown in FIG. 3A ).
- sequential instructions 300 include various loads and stores that progress serially and are intended to be executed by a single core of a processor. Sequential instructions 300 may serve as serial input code 206 of speculation engine 118 . As shown in FIG. 3B , parallel instructions 302 may represent parallel output code 204 of speculation engine 118 . Threads 304 - 308 may be able to be executed separately by cores 106 - 110 .
- Threads 304 - 308 may each include a processor instruction (mark_comm_addr for example) which, when executed, sets the synchronization bit 114 for a particular cache line or word 116 before a dependence source, such as a store instruction. Threads 304 - 308 may also each include a corresponding processor instruction (clear_comm_addr for example) which, when executed, clears the synchronization bit 114 after the dependence source.
- An example of a data dependence can be seen in threads 304 and 308 , where a dependence sink would have to wait for a dependence source to complete and clear the synchronization bit. In this case load 310 would stall the progress of thread 308 until store 312 is completed and thread 304 clears the associated synchronization bit.
- FIG. 4 shown is a flow chart for dynamic data synchronization in thread-level speculation in accordance with an embodiment of the present invention.
- the method begins with creating ( 402 ) parallel threads from serial instructions.
- thread services 208 is invoked to generate parallel instructions 302 from sequential instructions 300 .
- the number of threads ( 304 - 308 ) generated is based at least in part on the number of cores ( 106 - 110 ) in a processor.
- synchronization set services 210 inserts instructions (mark_comm_addr) into threads 304 - 308 at an early point before the dependence source or potential dependence source when an address is generated.
- synchronization clear services 212 inserts instructions (clear_comm_addr) into threads 304 - 308 after the dependence source or potential dependence source.
- the method concludes with executing ( 406 ) the parallel threads on cores of a multi-core processor.
- threads 304 - 308 are executed on cores 106 - 110 , respectively.
- the execution of core 110 may stall on load 310 until synchronization bit 114 is cleared by thread 304 executing on core 106 .
- multiprocessor system 500 is a point-to-point interconnect system, and includes a first processor 570 and a second processor 580 coupled via a point-to-point interconnect 550 .
- processors 570 and 580 may be multicore processors, including first and second processor cores (i.e., processor cores 574 a and 574 b and processor cores 584 a and 584 b ).
- Each processor may include dynamic data synchronization thread-level speculation hardware, software, and firmware in accordance with an embodiment of the present invention.
- first processor 570 further includes a memory controller hub (MCH) 572 and point-to-point (P-P) interfaces 576 and 578 .
- second processor 580 includes a MCH 582 and P-P interfaces 586 and 588 .
- MCH's 572 and 582 couple the processors to respective memories, namely a memory 532 and a memory 534 , which may be portions of main memory (e.g., a dynamic random access memory (DRAM)) locally attached to the respective processors, each of which may include extended page tables in accordance with one embodiment of the present invention.
- First processor 570 and second processor 580 may be coupled to a chipset 590 via P-P interconnects 552 and 554 , respectively.
- chipset 590 includes P-P interfaces 594 and 598 .
- chipset 590 includes an interface 592 to couple chipset 590 with a high performance graphics engine 538 .
- chipset 590 may be coupled to a first bus 516 via an interface 596 .
- various I/O devices 514 may be coupled to first bus 516 , along with a bus bridge 518 which couples first bus 516 to a second bus 520 .
- Various devices may be coupled to second bus 520 including, for example, a keyboard/mouse 522 , communication devices 526 and a data storage unit 528 such as a disk drive or other mass storage device which may include code 530 , in one embodiment.
- an audio I/O 524 may be coupled to second bus 520 .
- Embodiments may be implemented in code and may be stored on a storage medium having stored thereon instructions which can be used to program a system to perform the instructions.
- the storage medium may include, but is not limited to, any type of disk including floppy disks, optical disks, compact disk read-only memories (CD-ROMs), compact disk rewritables (CD-RWs), and magneto-optical disks, semiconductor devices such as read-only memories (ROMs), random access memories (RAMs) such as dynamic random access memories (DRAMs), static random access memories (SRAMs), erasable programmable read-only memories (EPROMs), flash memories, electrically erasable programmable read-only memories (EEPROMs), magnetic or optical cards, or any other type of media suitable for storing electronic instructions.
- ROMs read-only memories
- RAMs random access memories
- DRAMs dynamic random access memories
- SRAMs static random access memories
- EPROMs erasable programmable read-only memories
- EEPROMs electrical
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Memory System Of A Hierarchy Structure (AREA)
- Advance Control (AREA)
Abstract
In one embodiment, the present invention introduces a speculation engine to parallelize serial instructions by creating separate threads from the serial instructions and inserting processor instructions to set a synchronization bit before a dependence source and to clear the synchronization bit after a dependence source, where the synchronization bit is designed to stall a dependence sink from a thread running on a separate core. Other embodiments are described and claimed.
Description
- In modern processors, it is common to have multiple computing cores capable of executing in parallel. However, many sequential or serial applications and programs fail to exploit parallel architectures effectively. Thread-level speculation (TLS) is a promising technique to parallelize sequential programs with static or dynamic compilers and hardware to recover if mis-speculation happens. Without proper synchronization, however, between dependent load and store instructions, for example, loads may execute before stores and cause data violations that squash the speculative threads and require re-execution with re-loaded data.
-
FIG. 1 is a block diagram of an example system in accordance with one embodiment of the present invention. -
FIG. 2 is a block diagram of an example speculation engine in accordance with an embodiment of the present invention. -
FIGS. 3A and 3B are block diagrams of example software code in accordance with an embodiment of the present invention. -
FIG. 4 is a flow chart for dynamic data synchronization in thread-level speculation in accordance with an embodiment of the present invention. -
FIG. 5 is a block diagram of a system in accordance with an embodiment of the present invention. - In various embodiments, a processor is introduced with a speculative cache with synchronization bits that, when set, can stall a read of the cache line or word. One skilled in the art would recognize that this may prevent mis-speculation and the associated inefficiencies of squashed threads. Also presented are processor instructions to set and clear the synchronization bits. Compilers may take advantage of these instructions to synchronize data dependencies. The present invention is intended to be practiced in processors and systems that may include additional parallelization and/or thread speculation features.
- Referring now to
FIG. 1 , shown is a block diagram of an example system in accordance with one embodiment of the present invention. As shown inFIG. 1 ,system 100 may includeprocessor 102 andmemory 104, such as dynamic random access memory (DRAM).Processor 102 may include cores 106-110,speculative cache 112 andspeculation engine 118. Cores 106-110 may be able to execute instructions independently from one another and may include any type of architecture. While shown as including three cores,processor 102 may have any number of cores and may include other components or controllers, not shown. In one embodiment,processor 102 is a system on a chip (SOC). -
Speculative cache 112 may include any number of separate caches and may contain any number of entries. While intended as a low latency level one cache,speculative cache 112 may be implemented in any memory technology at any hierarchical level.Speculative cache 112 includessynchronization bit 114 associated with cache line orword 116. Whensynchronization bit 114 is set, as described in greater detail hereinafter, line orword 116 would not be able to be loaded by a core, because, for example, another core may be about to perform a store upon which the load depends. In one embodiment, a core trying to load from cache line orword 116 whensynchronization bit 114 is set would stall untilsynchronization bit 114 is cleared. -
Speculation engine 118 may implement a method for dynamic data synchronization in thread-level speculation, for example as described in reference toFIG. 4 , and may have an architecture as described in reference toFIG. 2 .Speculation engine 118 may be separate fromprocessor 102 and may be implemented in hardware, software or a combination of hardware and software. - Referring now to
FIG. 2 , shown is a block diagram of an example speculation engine in accordance with an embodiment of the present invention. As shown inFIG. 2 ,speculation engine 118 may include parallelizeservices 202,parallel output code 204 andserial input code 206. Parallelizeservices 202 may providespeculation engine 118 with the ability to parallelize serial instructions and add dynamic data synchronization in thread-level speculation. - Parallelize
services 202 may includethread services 208, synchronization setservices 210, and synchronizationclear services 212 which may create parallel threads from serial instructions, insert processor instructions to set synchronization bits before dependence sources, and insert processor instructions to clear synchronization bits after dependence sources, respectively. Parallelizeservices 202 may create parallel output code 204 (for example as shown inFIG. 3B ) from serial input code 206 (for example as shown inFIG. 3A ). - Referring now to
FIGS. 3A and 3B , shown are block diagrams of example software code in accordance with an embodiment of the present invention. As shown inFIG. 3A ,sequential instructions 300 include various loads and stores that progress serially and are intended to be executed by a single core of a processor.Sequential instructions 300 may serve asserial input code 206 ofspeculation engine 118. As shown inFIG. 3B ,parallel instructions 302 may representparallel output code 204 ofspeculation engine 118. Threads 304-308 may be able to be executed separately by cores 106-110. - Threads 304-308 may each include a processor instruction (mark_comm_addr for example) which, when executed, sets the
synchronization bit 114 for a particular cache line orword 116 before a dependence source, such as a store instruction. Threads 304-308 may also each include a corresponding processor instruction (clear_comm_addr for example) which, when executed, clears thesynchronization bit 114 after the dependence source. An example of a data dependence can be seen inthreads case load 310 would stall the progress ofthread 308 untilstore 312 is completed andthread 304 clears the associated synchronization bit. - Referring now to
FIG. 4 , shown is a flow chart for dynamic data synchronization in thread-level speculation in accordance with an embodiment of the present invention. As shown inFIG. 4 , the method begins with creating (402) parallel threads from serial instructions. In one embodiment,thread services 208 is invoked to generateparallel instructions 302 fromsequential instructions 300. In another embodiment, the number of threads (304-308) generated is based at least in part on the number of cores (106-110) in a processor. - The method continues with inserting (404) processor instructions to set and clear synchronization bits. In one embodiment, synchronization set
services 210 inserts instructions (mark_comm_addr) into threads 304-308 at an early point before the dependence source or potential dependence source when an address is generated. In another embodiment, synchronizationclear services 212 inserts instructions (clear_comm_addr) into threads 304-308 after the dependence source or potential dependence source. - The method concludes with executing (406) the parallel threads on cores of a multi-core processor. In one embodiment, threads 304-308 are executed on cores 106-110, respectively. In one embodiment, the execution of
core 110 may stall onload 310 untilsynchronization bit 114 is cleared bythread 304 executing oncore 106. - Embodiments may be implemented in many different system types. Referring now to
FIG. 5 , shown is a block diagram of a system in accordance with an embodiment of the present invention. As shown inFIG. 5 ,multiprocessor system 500 is a point-to-point interconnect system, and includes afirst processor 570 and asecond processor 580 coupled via a point-to-point interconnect 550. As shown inFIG. 5 , each ofprocessors processor cores processor cores - Still referring to
FIG. 5 ,first processor 570 further includes a memory controller hub (MCH) 572 and point-to-point (P-P) interfaces 576 and 578. Similarly,second processor 580 includes aMCH 582 andP-P interfaces FIG. 5 , MCH's 572 and 582 couple the processors to respective memories, namely amemory 532 and amemory 534, which may be portions of main memory (e.g., a dynamic random access memory (DRAM)) locally attached to the respective processors, each of which may include extended page tables in accordance with one embodiment of the present invention.First processor 570 andsecond processor 580 may be coupled to achipset 590 via P-P interconnects 552 and 554, respectively. As shown inFIG. 5 ,chipset 590 includesP-P interfaces - Furthermore,
chipset 590 includes aninterface 592 tocouple chipset 590 with a highperformance graphics engine 538. In turn,chipset 590 may be coupled to afirst bus 516 via aninterface 596. As shown inFIG. 5 , various I/O devices 514 may be coupled tofirst bus 516, along with a bus bridge 518 which couplesfirst bus 516 to asecond bus 520. Various devices may be coupled tosecond bus 520 including, for example, a keyboard/mouse 522,communication devices 526 and adata storage unit 528 such as a disk drive or other mass storage device which may includecode 530, in one embodiment. Further, an audio I/O 524 may be coupled tosecond bus 520. - Embodiments may be implemented in code and may be stored on a storage medium having stored thereon instructions which can be used to program a system to perform the instructions. The storage medium may include, but is not limited to, any type of disk including floppy disks, optical disks, compact disk read-only memories (CD-ROMs), compact disk rewritables (CD-RWs), and magneto-optical disks, semiconductor devices such as read-only memories (ROMs), random access memories (RAMs) such as dynamic random access memories (DRAMs), static random access memories (SRAMs), erasable programmable read-only memories (EPROMs), flash memories, electrically erasable programmable read-only memories (EEPROMs), magnetic or optical cards, or any other type of media suitable for storing electronic instructions.
- While the present invention has been described with respect to a limited number of embodiments, those skilled in the art will appreciate numerous modifications and variations therefrom. It is intended that the appended claims cover all such modifications and variations as fall within the true spirit and scope of this present invention.
Claims (20)
1. A storage medium comprising content which, when executed by an accessing machine, causes the accessing machine to:
execute instructions in a first core of a multi-core processor;
determine an address of a data in a speculative cache as part of a dependence sink; and
wait to access the data if a synchronization bit associated with the data has been set by a dependence source in a second core.
2. The storage medium of claim 1 , further comprising content which, when executed by an accessing machine, causes the accessing machine to set the synchronization bit by executing a processor instruction.
3. The storage medium of claim 2 , further comprising content which, when executed by an accessing machine, causes the accessing machine to clear the synchronization bit by executing a processor instruction.
4. The storage medium of claim 3 , wherein the dependence sink comprises a load instruction.
5. The storage medium of claim 3 , wherein the dependence source comprises a store instruction.
6. The storage medium of claim 3 , wherein the synchronization bit associated with the data comprises a cache line bit.
7. The storage medium of claim 3 , wherein the synchronization bit associated with the data comprises a cache word bit.
8. The storage medium of claim 3 , wherein the content to set the synchronization bit by executing a processor instruction comprises content to set the synchronization bit when a dependence source address is generated.
9. A system comprising:
a processor including a first core and a second core to execute instructions;
a speculative cache to store data and instructions for the processor, the speculative cache including synchronization bits to indicate if associated data is subject to a dependence source and to stall dependence sink operations when a synchronization bit is set;
a dynamic random access memory (DRAM) coupled to the processor, the DRAM to store serial instructions; and
a speculation engine, the speculation engine to parallelize the serial instructions by creating separate threads and inserting processor instructions to set the synchronization bits before a dependence source.
10. The system of claim 9 , further comprising the speculation engine to insert corresponding processor instructions to clear the synchronization bits after a dependence source.
11. The system of claim 10 , wherein the dependence source comprises a store instruction.
12. The system of claim 10 , wherein the dependence sink comprises a load instruction.
13. The system of claim 9 , wherein the synchronization bits comprise cache line bits.
14. The system of claim 9 , wherein the synchronization bits comprise cache word bits.
15. A method performed by a specialized speculation engine comprising:
creating parallelized threads from a set of serial instructions;
inserting processor instructions in the threads to set synchronization bits before a dependence source and to clear the synchronization bits after the dependence source, wherein the synchronization bits are designed to stall a dependence sink when set; and
executing the parallelized threads on cores of a multi-core processor.
16. The method of claim 15 , wherein the dependence source comprises a store instruction.
17. The method of claim 15 , wherein the dependence sink comprises a load instruction.
18. The method of claim 15 , wherein the synchronization bits comprise cache line bits.
19. The method of claim 15 , wherein the synchronization bits comprise cache word bits.
20. The method of claim 15 , wherein inserting processor instructions in the threads to set synchronization bits before a dependence source comprises inserting a processor instruction to set the synchronization bit when a dependence source address is generated.
Priority Applications (8)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/826,287 US20110320781A1 (en) | 2010-06-29 | 2010-06-29 | Dynamic data synchronization in thread-level speculation |
JP2013513423A JP2013527549A (en) | 2010-06-29 | 2011-06-27 | Dynamic data synchronization in thread-level speculation |
CN201180027637.4A CN103003796B (en) | 2010-06-29 | 2011-06-27 | Dynamic data synchronization in thread-level supposition |
PCT/US2011/042040 WO2012006030A2 (en) | 2010-06-29 | 2011-06-27 | Dynamic data synchronization in thread-level speculation |
EP11804093.0A EP2588959A4 (en) | 2010-06-29 | 2011-06-27 | Dynamic data synchronization in thread-level speculation |
AU2011276588A AU2011276588A1 (en) | 2010-06-29 | 2011-06-27 | Dynamic data synchronization in thread-level speculation |
KR1020127034256A KR101460985B1 (en) | 2010-06-29 | 2011-06-27 | Dynamic data synchronization in thread-level speculation |
TW100122652A TWI512611B (en) | 2010-06-29 | 2011-06-28 | Dynamic data synchronization in thread-level speculation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/826,287 US20110320781A1 (en) | 2010-06-29 | 2010-06-29 | Dynamic data synchronization in thread-level speculation |
Publications (1)
Publication Number | Publication Date |
---|---|
US20110320781A1 true US20110320781A1 (en) | 2011-12-29 |
Family
ID=45353688
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/826,287 Abandoned US20110320781A1 (en) | 2010-06-29 | 2010-06-29 | Dynamic data synchronization in thread-level speculation |
Country Status (8)
Country | Link |
---|---|
US (1) | US20110320781A1 (en) |
EP (1) | EP2588959A4 (en) |
JP (1) | JP2013527549A (en) |
KR (1) | KR101460985B1 (en) |
CN (1) | CN103003796B (en) |
AU (1) | AU2011276588A1 (en) |
TW (1) | TWI512611B (en) |
WO (1) | WO2012006030A2 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9811343B2 (en) * | 2013-06-07 | 2017-11-07 | Advanced Micro Devices, Inc. | Method and system for yield operation supporting thread-like behavior |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112130898B (en) * | 2019-06-24 | 2024-09-24 | 华为技术有限公司 | Method and device for inserting synchronous instruction |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5655096A (en) * | 1990-10-12 | 1997-08-05 | Branigin; Michael H. | Method and apparatus for dynamic scheduling of instructions to ensure sequentially coherent data in a processor employing out-of-order execution |
US6282637B1 (en) * | 1998-12-02 | 2001-08-28 | Sun Microsystems, Inc. | Partially executing a pending atomic instruction to unlock resources when cancellation of the instruction occurs |
US6785803B1 (en) * | 1996-11-13 | 2004-08-31 | Intel Corporation | Processor including replay queue to break livelocks |
US20050177831A1 (en) * | 2004-02-10 | 2005-08-11 | Goodman James R. | Computer architecture providing transactional, lock-free execution of lock-based programs |
US20050240930A1 (en) * | 2004-03-30 | 2005-10-27 | Kyushu University | Parallel processing computer |
US20060294326A1 (en) * | 2005-06-23 | 2006-12-28 | Jacobson Quinn A | Primitives to enhance thread-level speculation |
Family Cites Families (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7257814B1 (en) | 1998-12-16 | 2007-08-14 | Mips Technologies, Inc. | Method and apparatus for implementing atomicity of memory operations in dynamic multi-streaming processors |
WO2001061480A1 (en) | 2000-02-14 | 2001-08-23 | Intel Corporation | Processor having replay architecture with fast and slow replay paths |
US6862664B2 (en) * | 2003-02-13 | 2005-03-01 | Sun Microsystems, Inc. | Method and apparatus for avoiding locks by speculatively executing critical sections |
US20060143384A1 (en) * | 2004-12-27 | 2006-06-29 | Hughes Christopher J | System and method for non-uniform cache in a multi-core processor |
US7587555B2 (en) * | 2005-11-10 | 2009-09-08 | Hewlett-Packard Development Company, L.P. | Program thread synchronization |
US7930695B2 (en) * | 2006-04-06 | 2011-04-19 | Oracle America, Inc. | Method and apparatus for synchronizing threads on a processor that supports transactional memory |
DE112006003917T5 (en) * | 2006-05-30 | 2009-06-04 | Intel Corporation, Santa Clara | Method, device and system applied in a cache coherency protocol |
US8719807B2 (en) * | 2006-12-28 | 2014-05-06 | Intel Corporation | Handling precompiled binaries in a hardware accelerated software transactional memory system |
KR101086791B1 (en) * | 2007-06-20 | 2011-11-25 | 후지쯔 가부시끼가이샤 | Cache control device and control method |
US8855138B2 (en) * | 2008-08-25 | 2014-10-07 | Qualcomm Incorporated | Relay architecture framework |
JP5320618B2 (en) * | 2008-10-02 | 2013-10-23 | 株式会社日立製作所 | Route control method and access gateway apparatus |
US8732407B2 (en) * | 2008-11-19 | 2014-05-20 | Oracle America, Inc. | Deadlock avoidance during store-mark acquisition |
CN101657028B (en) * | 2009-09-10 | 2011-09-28 | 新邮通信设备有限公司 | Method, device and system for establishing S1 interface connection |
-
2010
- 2010-06-29 US US12/826,287 patent/US20110320781A1/en not_active Abandoned
-
2011
- 2011-06-27 JP JP2013513423A patent/JP2013527549A/en active Pending
- 2011-06-27 CN CN201180027637.4A patent/CN103003796B/en not_active Expired - Fee Related
- 2011-06-27 EP EP11804093.0A patent/EP2588959A4/en not_active Withdrawn
- 2011-06-27 AU AU2011276588A patent/AU2011276588A1/en not_active Abandoned
- 2011-06-27 WO PCT/US2011/042040 patent/WO2012006030A2/en active Application Filing
- 2011-06-27 KR KR1020127034256A patent/KR101460985B1/en not_active IP Right Cessation
- 2011-06-28 TW TW100122652A patent/TWI512611B/en not_active IP Right Cessation
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5655096A (en) * | 1990-10-12 | 1997-08-05 | Branigin; Michael H. | Method and apparatus for dynamic scheduling of instructions to ensure sequentially coherent data in a processor employing out-of-order execution |
US6785803B1 (en) * | 1996-11-13 | 2004-08-31 | Intel Corporation | Processor including replay queue to break livelocks |
US6282637B1 (en) * | 1998-12-02 | 2001-08-28 | Sun Microsystems, Inc. | Partially executing a pending atomic instruction to unlock resources when cancellation of the instruction occurs |
US20050177831A1 (en) * | 2004-02-10 | 2005-08-11 | Goodman James R. | Computer architecture providing transactional, lock-free execution of lock-based programs |
US20050240930A1 (en) * | 2004-03-30 | 2005-10-27 | Kyushu University | Parallel processing computer |
US20060294326A1 (en) * | 2005-06-23 | 2006-12-28 | Jacobson Quinn A | Primitives to enhance thread-level speculation |
Non-Patent Citations (1)
Title |
---|
Cintra et al. (Eliminating Squashes Through Learning Cross-Thread Violations in Speculative Parallelization for Multiprocessors); High-Performance Computer Architecture, 2002. Proceedings. Eighth International Symposium onDate of Conference: 2-6 Feb. 2002; Page(s): 43 - 54 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9811343B2 (en) * | 2013-06-07 | 2017-11-07 | Advanced Micro Devices, Inc. | Method and system for yield operation supporting thread-like behavior |
US10146549B2 (en) | 2013-06-07 | 2018-12-04 | Advanced Micro Devices, Inc. | Method and system for yield operation supporting thread-like behavior |
US10467013B2 (en) | 2013-06-07 | 2019-11-05 | Advanced Micro Devices, Inc. | Method and system for yield operation supporting thread-like behavior |
Also Published As
Publication number | Publication date |
---|---|
CN103003796B (en) | 2017-08-25 |
KR101460985B1 (en) | 2014-11-13 |
WO2012006030A2 (en) | 2012-01-12 |
TW201229893A (en) | 2012-07-16 |
KR20130040957A (en) | 2013-04-24 |
CN103003796A (en) | 2013-03-27 |
WO2012006030A3 (en) | 2012-05-24 |
EP2588959A2 (en) | 2013-05-08 |
TWI512611B (en) | 2015-12-11 |
EP2588959A4 (en) | 2014-04-16 |
JP2013527549A (en) | 2013-06-27 |
AU2011276588A1 (en) | 2013-01-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8364739B2 (en) | Sparse matrix-vector multiplication on graphics processor units | |
US9047114B2 (en) | Method and system for analyzing parallelism of program code | |
US9477465B2 (en) | Arithmetic processing apparatus, control method of arithmetic processing apparatus, and a computer-readable storage medium storing a control program for controlling an arithmetic processing apparatus | |
US20110161616A1 (en) | On demand register allocation and deallocation for a multithreaded processor | |
US20100153959A1 (en) | Controlling and dynamically varying automatic parallelization | |
US9600288B1 (en) | Result bypass cache | |
WO2009120981A2 (en) | Vector instructions to enable efficient synchronization and parallel reduction operations | |
An et al. | Speeding up FPGA placement: Parallel algorithms and methods | |
US10877755B2 (en) | Processor load using a bit vector to calculate effective address | |
US10031697B2 (en) | Random-access disjoint concurrent sparse writes to heterogeneous buffers | |
US8949777B2 (en) | Methods and systems for mapping a function pointer to the device code | |
US8935475B2 (en) | Cache management for memory operations | |
CN112130901A (en) | RISC-V based coprocessor, data processing method and storage medium | |
US8490071B2 (en) | Shared prefetching to reduce execution skew in multi-threaded systems | |
US20080244224A1 (en) | Scheduling a direct dependent instruction | |
Zhang et al. | GPU-TLS: An efficient runtime for speculative loop parallelization on gpus | |
US20110320781A1 (en) | Dynamic data synchronization in thread-level speculation | |
US20130166887A1 (en) | Data processing apparatus and data processing method | |
US20130159673A1 (en) | Providing capacity guarantees for hardware transactional memory systems using fences | |
US20210042111A1 (en) | Efficient encoding of high fanout communications | |
US20060242390A1 (en) | Advanced load address table buffer | |
JP2009098819A (en) | Memory system, control method for memory system, and computer system | |
Gong et al. | A novel configuration context cache structure of reconfigurable systems | |
Ermiş | Accelerating local search algorithms for travelling salesman problem using gpu effectively |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: INTEL CORPORATION, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LIU, WEI;WU, YOUFENG;SIGNING DATES FROM 20100916 TO 20100929;REEL/FRAME:027417/0235 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |