Hardware support for thread-level speculation

January 2003

Author:
J. Gregory Steffan
Carnegie Mellon University
,
Chair:
Todd C. Mowry
Carnegie Mellon University

Publisher:

Carnegie Mellon University
Schenley Park Pittsburgh, PA
United States

ISBN:978-0-496-92430-1

Order Number:AAI3159472

Pages:

161

Purchase on ProQuest

Bibliometrics

Abstract

Novel architectures that support multithreading, for example chip multiprocessors, have become increasingly commonplace over the past decade: examples include the Sun MAJC, IBM Power4, Alpha 21464, and Intel Xeon, HP PA-8800. However, only workloads composed of independent threads can take advantage of these processors—to improve the performance of a single application, that application must be transformed into a parallel version. Unfortunately, the process of parallelization is extremely difficult: the compiler must prove that potential threads are independent, which is not possible for many general-purpose programs (e.g., spreadsheets, web software, graphics codes, etc.) due to their abundant use of pointers, complex control flow, and complex data structures. This dissertation investigates hardware support for Thread-Level Speculation (TLS), a technique which empowers the compiler to optimistically create parallel threads despite uncertainty as to whether those threads are actually independent.

The basic idea behind the approach to thread-level speculation investigated in this dissertation is as follows. First, the compiler uses its global knowledge of control flow to decide how to break a program into speculative threads as well as transform and optimize the code for speculative execution; new architected instructions serve as the interface between software and hardware to manage this new form of parallel processing. Hardware support performs the run-time tasks of tracking data dependences between speculative threads, buffering speculative state from the regular memory system, and recovering from failed speculation. The hardware support for TLS presented in this dissertation is unique because it scales seamlessly both within and beyond chip boundaries—allowing this single unified design to apply to a wide variety of multithreaded processors and larger systems that use those processors as building blocks. Overall, this cooperative and unified approach has many advantages over previous approaches that focus on a specific scale of underlying architecture, or use either software or hardware in isolation.

This dissertation: (i) defines the roles of compiler and hardware support for TLS, as well as the interface between them; (ii) presents the design and evaluation of a unified mechanism for supporting thread-level speculation which can handle arbitrary memory access patterns and which is appropriate for any scale of architecture with parallel threads; (iii) provides a comprehensive evaluation of techniques for enhancing value communication between speculative threads, and quantifies the impact of compiler optimization on these techniques. All proposed mechanisms and techniques are evaluated in detail using a fully-automatic, feedback-directed compilation infrastructure and a realistic simulation platform. For the regions of code that are speculatively parallelized by the compiler and executed on the baseline hardware support, the performance of two of 15 general-purpose applications studied improves by more than twofold and nine others by more than 25%, and the performance of four of the six numeric applications studied improves by more than twofold, and the other two by more than 60%—confirming TLS as a promising way to exploit the naturally-multithreaded processing resources of future computer systems.

Cited By

Contributors

J. Gregory Steffan
University of Toronto
- Publication Years1998 - 2017
- Publication counts42
- Citation count1,738
- Available for Download37
- Downloads (cumulative)20,911
- Downloads (12 months)671
- Downloads (6 weeks)110
- Average Downloads per Article565
- Average Citation per Article41
View Full Profile
Todd Carl Mowry
Carnegie Mellon University
- Publication Years1991 - 2024
- Publication counts82
- Citation count8,278
- Available for Download88
- Downloads (cumulative)82,282
- Downloads (12 months)8,047
- Downloads (6 weeks)1,239
- Average Downloads per Article935
- Average Citation per Article101
View Full Profile

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Recommendations

Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading

To achieve high performance, contemporary computer systems rely on two forms of parallelism: instruction-level parallelism (ILP) and thread-level parallelism (TLP). Wide-issue super-scalar processors exploit ILP by executing multiple instructions from a ...
Combining thread level speculation helper threads and runahead execution
ICS '09: Proceedings of the 23rd international conference on Supercomputing

With the current trend toward multicore architectures, improved execution performance can no longer be obtained via traditional single-thread instruction level parallelism (ILP), but, instead, via multithreaded execution.Generating thread-parallel ...
Energy-Efficient Thread-Level Speculation

Chip multiprocessors with thread-level speculation have become the subject of intense research. This work refutes the claim that such a design is necessarily too energy inefficient. In addition, it proposes out-of-order task spawning to exploit more ...

Browse Theses

Sections

Cited By

Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading

Combining thread level speculation helper threads and runahead execution

Energy-Efficient Thread-Level Speculation

Sections

Cited By

Save to Binder

Recommendations

Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading

Combining thread level speculation helper threads and runahead execution

Energy-Efficient Thread-Level Speculation