More Web Proxy on the site http://driver.im/

research-article

Spice: speculative parallel iteration chunk execution

Authors:

Easwaran Raman,

Neil Va hharajani,

David I. AugustAuthors Info & Claims

CGO '08: Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization

Pages 175 - 184

https://doi.org/10.1145/1356058.1356082

Published: 06 April 2008 Publication History

Abstract

The recent trend in the processor industry of packing multiple processor cores in a chip has increased the importance of automatic techniques for extracting thread level parallelism. A promising approach for extracting thread level parallelism in general purpose applications is to apply memory alias or value speculation to break dependences amongst threads and executes them concurrently.

In this work, we present a speculative parallelization technique called Speculative Parallel Iteration Chunk execution (Spice) which relies on a novel software-only value prediction mechanism. Our value prediction technique predicts the loop live-ins of only a few iterations of a given loop, enabling speculative threads to start from those iterations. It also increases the probability of successful speculation by only predicting that the values will be used as live-ins in some future iterations of the loop. These twin properties enable our value prediction scheme to have high prediction accuracies while exposing significant coarse-grained thread-level parallelism. Spice has been implemented as an automatic transformation in a research compiler. The technique results in up to 157% speedup (101% on average) with 4 threads.

References

[1]

R. Allen and K. Kennedy. Optimizing compilers for modern architectures: A dependence--based approach. Morgan Kaufmann Publishers Inc., 2002.

Digital Library

[2]

B. Calder, G. Reinman, and D. Tullsen. Selective value prediction. pages 64--74, 1999.

Digital Library

[3]

M. Cintra, J. F. Martınez, and J. Torrellas. Architectural support for scalable speculative parallelization in shared--memory multiprocessors. In Proceedings of the 27th Annual International Symposium on Computer Architecture, pages 13--24, New York, NY, USA, 2000. ACM Press.

Digital Library

[4]

M. Cintra and J. Torrellas. Eliminating squashes through learning cross--thread violations in speculative parallelization for multiprocessors. In HPCA ’02: Proceedings of the 8th International Symposium on High--Performance Computer Architecture, page 43. IEEE Computer Society, 2002.

Digital Library

[5]

J. C. Corbett. Evaluating deadlock detection methods for concurrent software. IEEE Transactions on Software Engineering, 22(3):161--180, 1996.

Digital Library

[6]

F. H. Dang, H. Yu, and L. Rauchwerger. The R--LRPD test: Speculative parallelization of partially parallel loops. In IPDPS ’02: Proceedings of the 16th International Parallel and Distributed Processing Symposium, page 318, 2002.

Digital Library

[7]

P. A. Emrath and D. A. Padua. Automatic detection of nondeterminacy in parallel programs. In Proceedings of the 1988 ACM SIGPLAN and SIGOPS Workshop on Parallel and Distributed Debugging, pages 89--99, New York, NY, USA, 1988. ACM Press.

Digital Library

[8]

L. Hammond, B. A. Hubbert, M. Siu, M. K. Prabhu, M. Chen, and K. Olukotun. The Stanford Hydra CMP. IEEE Micro, 20(2):71--84, 2000.

Digital Library

[9]

M. Herlihy and J. E. B. Moss. Transactional memory: Architectural support for lock--free data structures. In Proceedings of the 20th Annual International Symposium on Computer Architecture, pages 289--300, 1993.

Digital Library

[10]

M. H. Lipasti and J. P. Shen. Exceeding the dataflow limit via value prediction. In Proceedings of the 29th International Symposium on Microarchitecture, pages 226--237, December 1996.

Digital Library

[11]

M. H. Lipasti, C. B. Wilkerson, and J. P. Shen. Value locality and load value prediction. In Proceedings of 7th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 138--147, September 1996.

Digital Library

[12]

W. Liu, J. Tuck, L. Ceze, W. Ahn, K. Strauss, J. Renau, and J. Torrellas. Posh: a tls compiler that exploits program structure. In PPoPP ’06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 158--167, 2006.

Digital Library

[13]

P. Marcuello and A. González. Clustered speculative multithreaded processors. In Proceedings of the 13th International Conference on Supercomputing, pages 365--372, New York, NY, USA, 1999. ACM Press.

Digital Library

[14]

P. Marcuello, J. Tubella, and A. Gonzalez. Value prediction for speculative multithreaded architectures. In Proceedings of the 32nd annual ACM/IEEE International Symposium on Microarchitecture. ACM Press, 1999.

Digital Library

[15]

J. T. Oplinger, D. L. Heine, and M. S. Lam. In search of speculative thread--level parallelism. In PACT ’99: Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques, page 303, Washington, DC, USA, 1999. IEEE Computer Society.

Digital Library

[16]

M. K. Prabhu and K. Olukotun. Using thread--level speculation to simplify manual parallelization. In Proceedings of the Ninth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 1--12, New York, NY, USA, 2003. ACM Press.

Digital Library

[17]

M. K. Prabhu and K. Olukotun. Exposing speculative thread parallelism in SPEC2000. In Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 142--152, New York, NY, USA, 2005. ACM Press.

Digital Library

[18]

L. Rauchwerger and D. A. Padua. The LRPD test: Speculative run--time parallelization of loops with privatization and reduction parallelization. IEEE Transactions on Parallel and Distributed Systems, 10(2):160--180, 1999.

Digital Library

[19]

S. Savage, M. Burrows, G. Nelson, P. Sobalvarro, and T. Anderson. Eraser: A dynamic data race detector for multithreaded programs. ACM Transactions on Computer Systems, 15(4):391--411, 1997.

Digital Library

[20]

G. S. Sohi, S. Breach, and T. N. Vijaykumar. Multiscalar processors. In Proceedings of the 22th International Symposium on Computer Architecture, June 1995.

Digital Library

[21]

J. G. Steffan, C. Colohan, A. Zhai, and T. C. Mowry. The STAMPede approach to thread--level speculation. ACM Transactions on Computer Systems, 23(3):253--300, 2005.

Digital Library

[22]

J. G. Steffan, C. B. Colohan, A. Zhai, and T. C. Mowry. Improving value communication for thread--level speculation. In Proceedings of the 8th International Symposium on High Performance Computer Architecture, pages 65--80, February 2002.

Digital Library

[23]

C. Zilles and G. Sohi. Master/slave speculative parallelization. In Proceedings of the 35th Annual International Symposium on Microarchitecture, pages 85--96, Los Alamitos, CA, USA, 2002. IEEE Computer Society Press.

Digital Library

Cited By

Estebanez ALlanos DOrden DPalop B(2022)On the choice of the best chunk size for the speculative execution of loopsPLOS ONE10.1371/journal.pone.026760217:5(e0267602)Online publication date: 17-May-2022
https://doi.org/10.1371/journal.pone.0267602
Dewald FRohde JHochberger CMantel H(2022)Improving Loop Parallelization by a Combination of Static and Dynamic Analyses in HLSACM Transactions on Reconfigurable Technology and Systems10.1145/350180115:3(1-31)Online publication date: 4-Feb-2022
https://dl.acm.org/doi/10.1145/3501801
Xia YJiang PAgrawal GAmaral JKulkarni M(2019)Enabling prefix sum parallelism pattern for recurrences with principled function reconstructionProceedings of the 28th International Conference on Compiler Construction10.1145/3302516.3307354(17-28)Online publication date: 16-Feb-2019
https://dl.acm.org/doi/10.1145/3302516.3307354
Show More Cited By

Index Terms

Spice: speculative parallel iteration chunk execution
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Source code generation

Recommendations

A continuation-based noninterruptible multithreading processor architecture

Current trend of research on multithreading processors is toward the chip multithreading (CMT), which exploits thread level parallelism (TLP) and improves performance of softwares built on traditional threading components, e.g., Pthread. There exist ...
A cost-driven compilation framework for speculative parallelization of sequential programs
PLDI '04: Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation

The emerging hardware support for thread-level speculation opens new opportunities to parallelize sequential programs beyond the traditional limits. By speculating that many data dependences are unlikely during runtime, consecutive iterations of a ...
A cost-driven compilation framework for speculative parallelization of sequential programs
PLDI '04

The emerging hardware support for thread-level speculation opens new opportunities to parallelize sequential programs beyond the traditional limits. By speculating that many data dependences are unlikely during runtime, consecutive iterations of a ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CGO '08: Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization

April 2008

235 pages

ISBN:9781595939784

DOI:10.1145/1356058

General Chair:
Mary Lou Soffa
University of Virginia, USA
,
Program Chair:
Evelyn Duesterwald
IBM Research, USA

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 April 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CGO '08

Sponsor:

CGO '08: 6th Annual IEEE / ACM International Symposium on Code Generation and Optimization

April 5 - 9, 2008

MA, Boston, USA

Acceptance Rates

CGO '08 Paper Acceptance Rate 21 of 66 submissions, 32%;

Overall Acceptance Rate 312 of 1,061 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

46
Total Citations
View Citations
421
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)1

Reflects downloads up to 11 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Estebanez ALlanos DOrden DPalop B(2022)On the choice of the best chunk size for the speculative execution of loopsPLOS ONE10.1371/journal.pone.026760217:5(e0267602)Online publication date: 17-May-2022
https://doi.org/10.1371/journal.pone.0267602
Dewald FRohde JHochberger CMantel H(2022)Improving Loop Parallelization by a Combination of Static and Dynamic Analyses in HLSACM Transactions on Reconfigurable Technology and Systems10.1145/350180115:3(1-31)Online publication date: 4-Feb-2022
https://dl.acm.org/doi/10.1145/3501801
Xia YJiang PAgrawal GAmaral JKulkarni M(2019)Enabling prefix sum parallelism pattern for recurrences with principled function reconstructionProceedings of the 28th International Conference on Compiler Construction10.1145/3302516.3307354(17-28)Online publication date: 16-Feb-2019
https://dl.acm.org/doi/10.1145/3302516.3307354
Deiana ECampanoni S(2019)Workload Characterization of Nondeterministic Programs Parallelized by STATS2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS.2019.00032(190-201)Online publication date: Mar-2019
https://doi.org/10.1109/ISPASS.2019.00032
Jiang PChen LAgrawal GEvripidou SStenström PO'Boyle M(2018)Revealing parallel scans and reductions in recurrences through function reconstructionProceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243204(1-13)Online publication date: 1-Nov-2018
https://dl.acm.org/doi/10.1145/3243176.3243204
Betzel FKhatamifard KSuresh HLilja DSartori JKarpuzcu U(2018)Approximate CommunicationACM Computing Surveys10.1145/314581251:1(1-32)Online publication date: 10-Jan-2018
https://dl.acm.org/doi/10.1145/3145812
Wang ZZhao YLiu YChen ZLv CLi Y(2017)A speculative parallel decompression algorithm on Apache SparkThe Journal of Supercomputing10.1007/s11227-017-2000-373:9(4082-4111)Online publication date: 1-Sep-2017
https://dl.acm.org/doi/10.1007/s11227-017-2000-3
Estebanez ALlanos DGonzalez-Escribano A(2017)Using the Xeon Phi Platform to Run Speculatively-Parallelized CodesInternational Journal of Parallel Programming10.1007/s10766-016-0421-x45:2(225-241)Online publication date: 1-Apr-2017
https://dl.acm.org/doi/10.1007/s10766-016-0421-x
Estebanez ALlanos DGonzalez-Escribano A(2016)A Survey on Thread-Level Speculation TechniquesACM Computing Surveys10.1145/293836949:2(1-39)Online publication date: 30-Jun-2016
https://dl.acm.org/doi/10.1145/2938369
Estebanez ALlanos DGonzalez-Escribano A(2016)New Data Structures to Handle Speculative Parallelization at RuntimeInternational Journal of Parallel Programming10.1007/s10766-014-0347-044:3(407-426)Online publication date: 1-Jun-2016
https://dl.acm.org/doi/10.1007/s10766-014-0347-0
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents