[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Min-cut program decomposition for thread-level speculation

Published: 09 June 2004 Publication History

Abstract

With billion-transistor chips on the horizon, single-chip multiprocessors (CMPs) are likely to become commodity components. Speculative CMPs use hardware to enforce dependence, allowing the compiler to improve performance by speculating on ambiguous dependences without absolute guarantees of independence. The compiler is responsible for decomposing a sequential program into speculatively parallel threads, while considering multiple performance overheads related to data dependence, load imbalance, and thread prediction. Although the decomposition problem lends itself to a min-cut-based approach, the overheads depend on the thread size, requiring the edge weights to be changed as the algorithm progresses. The changing weights make our approach different from graph-theoretic solutions to the general problem of task scheduling. One recent work uses a set of heuristics, each targeting a specific overhead in isolation, and gives precedence to thread prediction, without comparing the performance of the threads resulting from each heuristic. By contrast, our method uses a sequence of balanced min-cuts that give equal consideration to all the overheads, and adjusts the edge weights after every cut. This method achieves an (geometric) average speedup of 74% for floating-point programs and 23% for integer programs on a four-processor chip, improving on the 52% and 13% achieved by the previous heuristics.

References

[1]
A. Bhowmik and M. Franklin. A General Compiler Framework for Speculative Multithreading. In Proceedings of the 14th ACM Symposium on Parallel Algorithms and Architectures, August 2002.
[2]
S. E. Breach, T. N. Vijaykumar, and G. S. Sohi. The Anatomy of the Register File in a Multiscalar Processor. In Proceedings of the 27th International Symposium on Microarchitecture, pages 181--190, November 1994.
[3]
M. Cintra, J. F. Martínez, and J. Torrellas. Architectural Support for Scalable Speculative Parallelization in Shared-Memory Multiprocessors. In Proceedings of the 27th International Symposium on Computer Architecture, pages 13--24, June 2000.
[4]
L. Codrescu and D. S. Wills. On Dynamic Speculative Thread Partitioning and the MEM-Slicing Algorithm. Journal of Universal Computer Science, 6(10):908--914, 2000.
[5]
J. Edmonds and R. M. Karp. Theoretical Improvements in Algorithmic Efficiency for Network Flow Problems. Journal of the ACM, 19:248--264, 1972.
[6]
L. R. Ford Jr. and D. R. Fulkerson. Flows in Networks. Princeton University Press, 1962.
[7]
M. Franklin and G. S. Sohi. ARB: A Hardware Mechanism for Dynamic Reordering of Memory References. IEEE Transactions on Computers, pages 552--571, May 1996.
[8]
M. J. Garzarán et al. Tradeoffs in Buffering Memory State for Thread-Level Speculation in Multiprocessors. In Proceedings of the 9th IEEE Symposium on High-Performance Computer Architecture, February 2003.
[9]
S. Gopal, T. N. Vijaykumar, J. E. Smith, and G. S. Sohi. Speculative Versioning Cache. In Proceedings of the 4th IEEE Symposium on High-Performance Computer Architecture, pages 195--205, February 1998.
[10]
L. Hammond, B. A. Nayfeh, and K. Olukotun. A Single-Chip Multiprocessor. IEEE Computer, 30(9):79--85, 1997.
[11]
L. Hammond, M. Willey, and K. Olukotun. Data Speculation Support for a Chip Multiprocessor. In Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 58--69, October 1998.
[12]
J. Hennessy and D. Patterson. Computer Architecture: A Quantitative Approach, 3rd Edition. Morgan Kaufmann, New York, 2002.
[13]
IBM. IBM e-Server Power4 System Microarchitecture. Technical report, October 2001.
[14]
I. H. Kazi and D. J. Lilja. JavaSpMT: A Speculative Thread Pipelining Parallelization Model for Java Programs. In Proceedings of the 14th International Parallel and Distributed Processing Symposium, pages 559--564, May 2000.
[15]
S. W. Kim and R. Eigenmann. The Structure of a Compiler for Explicit and Implicit Parallelism. In Proceedings of the Workshop on Languages and Compilers for Parallel Computing (LCPC), August 2001.
[16]
S. W. Kim et al. Reference Idempotency Analysis: A Framework for Optimizing Speculative Execution. In Proceedings of the Symposium on Principles and Practice of Parallel Programming, volume 36, pages 2--11, 2001.
[17]
V. Krishnan and J. Torrellas. Hardware and Software Support for Speculative Execution of Sequential Binaries on a Chip-Multiprocessor. In Proceedings of the International Conference on Supercomputing, pages 85--92, July 1998.
[18]
Y.-K. Kwok and I. Ahmad. Static Scheduling Algorithms for Allocating Directed Task Graphs to Multiprocessors. ACM Computing Surveys, 31(4):406--471, 1999.
[19]
A. Moshovos, S. E. Breach, T. N. Vijaykumar, and G. S. Sohi. Dynamic Speculation and Synchronization of Data Dependences. In Proceedings of the 24th International Symposium on Computer Architecture, pages 181--193, June 1997.
[20]
K. Olukotun, L. Hammond, and M. Willey. Improving the Performance of Speculatively Parallel Applications on the Hydra CMP. In Proceedings of the International Conference on Supercomputing, pages 21--30, June 1999.
[21]
C. L. Ooi et al. Multiplex: Unifying Conventional and Speculative Thread-Level Parallelism on a Chip Multiprocessor. In Proceedings of the International Conference on Supercomputing, June 2001.
[22]
P. M. Petersen and D. A. Padua. Static and Dynamic Evaluation of Data Dependence Analysis. In Proceedings of the International Conference on Supercomputing, pages 107--116, July 1993.
[23]
M. K. Prabhu and K. Olukotun. Using Thread-Level Speculation to Simplify Manual Parallelization. In Proceedings of the Symposium on Principles and Practice of Parallel Programming, June 2003.
[24]
V. Sarkar. Determining Average Program Execution Times and their Variance. In Proceedings of the Conference on Programming Language Design and Implementation, pages 298--312, 1989.
[25]
V. Sarkar and J. Hennessy. Partitioning Parallel Programs for Macro-Dataflow. In Proceedings of the Conference on LISP and Functional Programming, pages 202--211, 1986.
[26]
G. Sohi, S. Breach, and T. N. Vijaykumar. Multiscalar Processors. In Proceedings of the 22nd International Symposium on Computer Architecture, June 1995.
[27]
J. G. Steffan. Hardware Support for Thread-Level Speculation. PhD thesis, Carnegie-Mellon University, April 2003.
[28]
J. G. Steffan, C. B. Colohan, A. Zhai, and T. C. Mowry. A Scalable Approach to Thread-Level Speculation. In Proceedings of the 27th International Symposium on Computer Architecture, pages 1--12, June 2000.
[29]
J. G. Steffan and T. C. Mowry. The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization. In Proceedings of the 4th IEEE Symposium on High-Performance Computer Architecture, pages 2--13, February 1998.
[30]
J.-Y. Tsai and P.-C. Yew. The Superthreaded Architecture: Thread Pipelining with Run-Time Data Dependence Checking and Control Speculation. In Proceedings of the International Conference on Parallel Architecture and Compiler Techniques, pages 35--46, October 1996.
[31]
T. N. Vijaykumar and G. S. Sohi. Task Selection for a Multiscalar Processor. In Proceedings of the 31st International Symposium on Microarchitecture, December 1998.
[32]
E. Waingold et al. Baring It All to Software: Raw Machines. IEEE Computer, 30(9):86--93, 1997.
[33]
D. W. Wall. Predicting Program Behavior Using Real or Estimated Profiles. In Proceedings of the Conference on Programming Language Design and Implementation, volume 26, pages 59--70, June 1991.
[34]
D. B. West. Introduction to Graph Theory - Second Edition. Prentice-Hall, 2001.
[35]
H. H. Yang and D. F. Wong. Efficient Network Flow Based Min-Cut Balanced Partitioning. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 15(12), December 1996.
[36]
Y. Zhang, L. Rauchwerger, and J. Torrellas. Hardware for Speculative Run-Time Parallelization in Distributed Shared-Memory Multiprocessors. In Proceedings of the 4th IEEE Symposium on High-Performance Computer Architecture, pages 162--173, 1998.

Cited By

View all
  • (2024)IDaTPA: importance degree based thread partitioning approach in thread level speculationDiscover Computing10.1007/s10791-024-09440-x27:1Online publication date: 19-Jun-2024
  • (2021)Identifying complaints based on semi-supervised mincutsExpert Systems with Applications10.1016/j.eswa.2021.115668(115668)Online publication date: Aug-2021
  • (2019)Register Requirement Minimization of Fixed-Depth Pipelines for Streaming Data Applications2019 32nd IEEE International System-on-Chip Conference (SOCC)10.1109/SOCC46988.2019.1570548393(406-411)Online publication date: Sep-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 39, Issue 6
PLDI '04
May 2004
299 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/996893
Issue’s Table of Contents
  • cover image ACM Conferences
    PLDI '04: Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
    June 2004
    310 pages
    ISBN:1581138075
    DOI:10.1145/996841
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 June 2004
Published in SIGPLAN Volume 39, Issue 6

Check for updates

Author Tags

  1. chip multiprocessor
  2. min-cut
  3. partitioning
  4. program decomposition
  5. thread-level speculation

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)2
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)IDaTPA: importance degree based thread partitioning approach in thread level speculationDiscover Computing10.1007/s10791-024-09440-x27:1Online publication date: 19-Jun-2024
  • (2021)Identifying complaints based on semi-supervised mincutsExpert Systems with Applications10.1016/j.eswa.2021.115668(115668)Online publication date: Aug-2021
  • (2019)Register Requirement Minimization of Fixed-Depth Pipelines for Streaming Data Applications2019 32nd IEEE International System-on-Chip Conference (SOCC)10.1109/SOCC46988.2019.1570548393(406-411)Online publication date: Sep-2019
  • (2019)TPAoPI:A Thread Partitioning Approach Based on Procedure Importance in Speculative Multithreading2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS)10.1109/HPCC/SmartCity/DSS.2019.00325(2339-2344)Online publication date: Aug-2019
  • (2019)ProCTA: program characteristic-based thread partition approachThe Journal of Supercomputing10.1007/s11227-019-02943-1Online publication date: 3-Jul-2019
  • (2015)Compiler-Driven Software Speculation for Thread-Level ParallelismACM Transactions on Programming Languages and Systems10.1145/282150538:2(1-45)Online publication date: 22-Dec-2015
  • (2014)A Static Greedy and Dynamic Adaptive Thread Spawning Approach for Loop-Level ParallelismJournal of Computer Science and Technology10.1007/s11390-014-1482-129:6(962-975)Online publication date: 17-Nov-2014
  • (2014)A thread partitioning approach for speculative multithreadingThe Journal of Supercomputing10.1007/s11227-013-1000-167:3(778-805)Online publication date: 1-Mar-2014
  • (2009)A study of potential parallelism among traces in Java programsScience of Computer Programming10.1016/j.scico.2009.01.00474:5-6(296-313)Online publication date: 1-Mar-2009
  • (2006)A lifetime optimal algorithm for speculative PREACM Transactions on Architecture and Code Optimization10.1145/1138035.11380363:2(115-155)Online publication date: 1-Jun-2006
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media