[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1356058.1356082acmconferencesArticle/Chapter ViewAbstractPublication PagescgoConference Proceedingsconference-collections
research-article

Spice: speculative parallel iteration chunk execution

Published: 06 April 2008 Publication History

Abstract

The recent trend in the processor industry of packing multiple processor cores in a chip has increased the importance of automatic techniques for extracting thread level parallelism. A promising approach for extracting thread level parallelism in general purpose applications is to apply memory alias or value speculation to break dependences amongst threads and executes them concurrently.
In this work, we present a speculative parallelization technique called Speculative Parallel Iteration Chunk execution (Spice) which relies on a novel software-only value prediction mechanism. Our value prediction technique predicts the loop live-ins of only a few iterations of a given loop, enabling speculative threads to start from those iterations. It also increases the probability of successful speculation by only predicting that the values will be used as live-ins in some future iterations of the loop. These twin properties enable our value prediction scheme to have high prediction accuracies while exposing significant coarse-grained thread-level parallelism. Spice has been implemented as an automatic transformation in a research compiler. The technique results in up to 157% speedup (101% on average) with 4 threads.

References

[1]
R. Allen and K. Kennedy. Optimizing compilers for modern architectures: A dependence--based approach. Morgan Kaufmann Publishers Inc., 2002.
[2]
B. Calder, G. Reinman, and D. Tullsen. Selective value prediction. pages 64--74, 1999.
[3]
M. Cintra, J. F. Martınez, and J. Torrellas. Architectural support for scalable speculative parallelization in shared--memory multiprocessors. In Proceedings of the 27th Annual International Symposium on Computer Architecture, pages 13--24, New York, NY, USA, 2000. ACM Press.
[4]
M. Cintra and J. Torrellas. Eliminating squashes through learning cross--thread violations in speculative parallelization for multiprocessors. In HPCA ’02: Proceedings of the 8th International Symposium on High--Performance Computer Architecture, page 43. IEEE Computer Society, 2002.
[5]
J. C. Corbett. Evaluating deadlock detection methods for concurrent software. IEEE Transactions on Software Engineering, 22(3):161--180, 1996.
[6]
F. H. Dang, H. Yu, and L. Rauchwerger. The R--LRPD test: Speculative parallelization of partially parallel loops. In IPDPS ’02: Proceedings of the 16th International Parallel and Distributed Processing Symposium, page 318, 2002.
[7]
P. A. Emrath and D. A. Padua. Automatic detection of nondeterminacy in parallel programs. In Proceedings of the 1988 ACM SIGPLAN and SIGOPS Workshop on Parallel and Distributed Debugging, pages 89--99, New York, NY, USA, 1988. ACM Press.
[8]
L. Hammond, B. A. Hubbert, M. Siu, M. K. Prabhu, M. Chen, and K. Olukotun. The Stanford Hydra CMP. IEEE Micro, 20(2):71--84, 2000.
[9]
M. Herlihy and J. E. B. Moss. Transactional memory: Architectural support for lock--free data structures. In Proceedings of the 20th Annual International Symposium on Computer Architecture, pages 289--300, 1993.
[10]
M. H. Lipasti and J. P. Shen. Exceeding the dataflow limit via value prediction. In Proceedings of the 29th International Symposium on Microarchitecture, pages 226--237, December 1996.
[11]
M. H. Lipasti, C. B. Wilkerson, and J. P. Shen. Value locality and load value prediction. In Proceedings of 7th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 138--147, September 1996.
[12]
W. Liu, J. Tuck, L. Ceze, W. Ahn, K. Strauss, J. Renau, and J. Torrellas. Posh: a tls compiler that exploits program structure. In PPoPP ’06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 158--167, 2006.
[13]
P. Marcuello and A. González. Clustered speculative multithreaded processors. In Proceedings of the 13th International Conference on Supercomputing, pages 365--372, New York, NY, USA, 1999. ACM Press.
[14]
P. Marcuello, J. Tubella, and A. Gonzalez. Value prediction for speculative multithreaded architectures. In Proceedings of the 32nd annual ACM/IEEE International Symposium on Microarchitecture. ACM Press, 1999.
[15]
J. T. Oplinger, D. L. Heine, and M. S. Lam. In search of speculative thread--level parallelism. In PACT ’99: Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques, page 303, Washington, DC, USA, 1999. IEEE Computer Society.
[16]
M. K. Prabhu and K. Olukotun. Using thread--level speculation to simplify manual parallelization. In Proceedings of the Ninth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 1--12, New York, NY, USA, 2003. ACM Press.
[17]
M. K. Prabhu and K. Olukotun. Exposing speculative thread parallelism in SPEC2000. In Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 142--152, New York, NY, USA, 2005. ACM Press.
[18]
L. Rauchwerger and D. A. Padua. The LRPD test: Speculative run--time parallelization of loops with privatization and reduction parallelization. IEEE Transactions on Parallel and Distributed Systems, 10(2):160--180, 1999.
[19]
S. Savage, M. Burrows, G. Nelson, P. Sobalvarro, and T. Anderson. Eraser: A dynamic data race detector for multithreaded programs. ACM Transactions on Computer Systems, 15(4):391--411, 1997.
[20]
G. S. Sohi, S. Breach, and T. N. Vijaykumar. Multiscalar processors. In Proceedings of the 22th International Symposium on Computer Architecture, June 1995.
[21]
J. G. Steffan, C. Colohan, A. Zhai, and T. C. Mowry. The STAMPede approach to thread--level speculation. ACM Transactions on Computer Systems, 23(3):253--300, 2005.
[22]
J. G. Steffan, C. B. Colohan, A. Zhai, and T. C. Mowry. Improving value communication for thread--level speculation. In Proceedings of the 8th International Symposium on High Performance Computer Architecture, pages 65--80, February 2002.
[23]
C. Zilles and G. Sohi. Master/slave speculative parallelization. In Proceedings of the 35th Annual International Symposium on Microarchitecture, pages 85--96, Los Alamitos, CA, USA, 2002. IEEE Computer Society Press.

Cited By

View all
  • (2022)On the choice of the best chunk size for the speculative execution of loopsPLOS ONE10.1371/journal.pone.026760217:5(e0267602)Online publication date: 17-May-2022
  • (2022)Improving Loop Parallelization by a Combination of Static and Dynamic Analyses in HLSACM Transactions on Reconfigurable Technology and Systems10.1145/350180115:3(1-31)Online publication date: 4-Feb-2022
  • (2019)Enabling prefix sum parallelism pattern for recurrences with principled function reconstructionProceedings of the 28th International Conference on Compiler Construction10.1145/3302516.3307354(17-28)Online publication date: 16-Feb-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CGO '08: Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
April 2008
235 pages
ISBN:9781595939784
DOI:10.1145/1356058
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 April 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. automatic paralleization
  2. multicore architectures
  3. speculative parallelization
  4. thread level parallelism
  5. value speculation

Qualifiers

  • Research-article

Conference

CGO '08

Acceptance Rates

CGO '08 Paper Acceptance Rate 21 of 66 submissions, 32%;
Overall Acceptance Rate 312 of 1,061 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)1
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2022)On the choice of the best chunk size for the speculative execution of loopsPLOS ONE10.1371/journal.pone.026760217:5(e0267602)Online publication date: 17-May-2022
  • (2022)Improving Loop Parallelization by a Combination of Static and Dynamic Analyses in HLSACM Transactions on Reconfigurable Technology and Systems10.1145/350180115:3(1-31)Online publication date: 4-Feb-2022
  • (2019)Enabling prefix sum parallelism pattern for recurrences with principled function reconstructionProceedings of the 28th International Conference on Compiler Construction10.1145/3302516.3307354(17-28)Online publication date: 16-Feb-2019
  • (2019)Workload Characterization of Nondeterministic Programs Parallelized by STATS2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS.2019.00032(190-201)Online publication date: Mar-2019
  • (2018)Revealing parallel scans and reductions in recurrences through function reconstructionProceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243204(1-13)Online publication date: 1-Nov-2018
  • (2018)Approximate CommunicationACM Computing Surveys10.1145/314581251:1(1-32)Online publication date: 10-Jan-2018
  • (2017)A speculative parallel decompression algorithm on Apache SparkThe Journal of Supercomputing10.1007/s11227-017-2000-373:9(4082-4111)Online publication date: 1-Sep-2017
  • (2017)Using the Xeon Phi Platform to Run Speculatively-Parallelized CodesInternational Journal of Parallel Programming10.1007/s10766-016-0421-x45:2(225-241)Online publication date: 1-Apr-2017
  • (2016)A Survey on Thread-Level Speculation TechniquesACM Computing Surveys10.1145/293836949:2(1-39)Online publication date: 30-Jun-2016
  • (2016)New Data Structures to Handle Speculative Parallelization at RuntimeInternational Journal of Parallel Programming10.1007/s10766-014-0347-044:3(407-426)Online publication date: 1-Jun-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media