[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/512529.512544acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
Article

Post-pass binary adaptation for software-based speculative precomputation

Published: 17 May 2002 Publication History

Abstract

Recently, a number of thread-based prefetching techniques have been proposed. These techniques aim at improving the latency of single-threaded applications by leveraging multithreading resources to perform memory prefetching via speculative prefetch threads. Software-based speculative precomputation (SSP) is one such technique, proposed for multithreaded Itanium models. SSP does not require expensive hardware support-instead it relies on the compiler to adapt binaries to perform prefetching on otherwise idle hardware thread contexts at run time. This paper presents a post-pass compilation tool for generating SSP-enhanced binaries. The tool is able to: (1) analyze a single-threaded application to generate prefetch threads; (2) identify and embed trigger points in the original binary; and (3) produce a new binary that has the prefetch threads attached. The execution of the new binary spawns the speculative prefetch threads, which are executed concurrently with the main thread. Our results indicate that for a set of pointer-intensive benchmarks, the prefetching performed by the speculative threads achieves an average of 87% speedup on an in-order processor and 5% speedup on an out-of-order processor.

References

[1]
T. Aamodt, A. Moshovos, and P. Chow. The Predictability of Computations that Produce Unpredictable Outcomes. In 5th Workshop on Multithreaded Execution, Architecture and Compilation, pp. 23-34, Austin, Texas, December 2001
[2]
H. Agrawal and J. R. Horgan. Dynamic Program Slicing. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation, pp. 246-256, June 1990
[3]
M. Annavaram, J. Patel, E. Davidson. Data Prefetching by Dependence Graph Precomputation. In 28th International Symposium on Computer Architecture, Goteborg, Sweden, July 2001
[4]
J. Bharadwaj, W. Chen, W. Chuang, G. Hoflehner, K. Menezes, K. Muthukumar, J, Pierce. The Intel IA-64 Compiler Code Generator. In IEEE Micro, Sept-Oct 2000, pp. 44-53
[5]
M. Carlisle. Olden: Parallelizing Programs with Dynamic Data Structures on Distributed-Memory Machines, Ph.D. Thesis, Princeton University Department of Computer Science, June 1996
[6]
J. Collins, D. Tullsen, H. Wang, J. Shen, Dynamic Speculative Precomputation. In Micro conference, December 2001
[7]
J. Collins, H. Wang, D. Tullsen, C, Hughes, Y. Lee, D. Lavery, J. Shen. Speculative Precomputation: Long-range Prefetching of Delinquent Loads. In 28th International Symposium on Computer Architecture, Goteborg, Sweden, July 2001
[8]
K. Cooper, P. Schielke, D. Subramanian. An Experimental Evaluation of List Scheduling. Rice University Technical Report 98-326, September 1998
[9]
R. Cytron. Compiler-time Scheduling and Optimization for Asynchronous Machines. Ph.D. thesis, University of Illinois at Urbana-Champaign, 1984
[10]
J. Emer. Simultaneous Multithreading: Multiplying Alpha's Performance. In Microprocessor Forum, October 1999
[11]
R. Ghiya, D. Lavery, and D. Sehr. On the Importance of Points-to Analysis and Other Memory Disambiguation Methods for C Programs. In SIGPLAN Conference on Programming Language Design and Implementation, pp. 47-58, June 2001
[12]
A. V. Goldberg and R. E. Tarjan. A New Approach to the Maximum-Flow Problem. In Journal of the Association for Computing Machinery, 35(4):921-940, October 1988
[13]
R. Gupta and M. L. Soffa. Hybrid Slicing: an Approach for Refining Static Slicing Using Dynamic Information. In The Foundations of Software Engineering, pp. 29-40, September 1995
[14]
J. L. Henning. SPEC CPU2000: Measuring CPU Performance in the New Millennium. In IEEE Computer, July 2000
[15]
G. Hinton and J. Shen. Intel's Multi-Threading Technology. In Microprocessor Forum, October 2001
[16]
J. Huck, D. Morris, J. Ross, A. Knies, H. Mulder, R. Zahir, Introducing the IA-64 Architecture. In IEEE Micro, Sept-Oct 2000
[17]
R. Krishnaiyer, D. Kulkarni, D. Lavery, W. Li, C. Lim, J. Ng, D. Sehr. An Advanced Optimizer for the IA-64 Architecture, In IEEE Micro, Nov-Dec 2000
[18]
W. Landi and B. Ryder. A Safe Approximate Algorithm for Interprocedural Pointer Aliasing. In SIGPLAN '92 Conference on Programming Language Design and Implementation, pp. 235-248, June 1992
[19]
S. Liao. SUIF Explorer. Ph.D. thesis, Stanford University, August 2000, Stanford Technical Report CSL-TR-00-807
[20]
S. Liao, A. Diwan, R. Bosch, A. Ghuloum, M. S. Lam. SUIF Explorer: an Interactive and Interprocedural Parallelizer. In Proceedings of the 7th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 37-48, Atlanta, Georgia, May 1999
[21]
C. K. Luk, Tolerating Memory Latency through Software-Controlled Pre-Execution in Simultaneous Multithreading Processors, In 28th International Symposium on Computer Architecture, Goteborg, Sweden, June 2001
[22]
D. Marr, F. Binns, D. Hill, G. Hinton, D. Koufaty, J. Miller, M. Upton. Hyper-Threading Technology Architecture and Microarchitecture. In Intel Technology Journal, Volume 6, Issue on Hyper-threading, February 2002
[23]
A. Moshovos, D. Pnevmatikatos, A. Baniasadi. Slice Procesors: an Implementation of Operation-Based Prediction. In International Conference on Supercomputing, June 2001
[24]
E. Reingold, J. Nievergelt, N. Deo. Combinatorial Algorithms: Theory and Practice. Prentice-Hall Publishers, 1977
[25]
A. Roth and G. Sohi. Speculative Data-Driven Multithreading. In 7th IEEE High-Performance Computer Architecture, January 2001
[26]
H. Sharangpani and K. Aurora, Itanium Processor Microarchitecture. In IEEE Micro, Sept-Oct 2000
[27]
D. M. Tullsen. Simulation and Modeling of a Simultaneous Multithreaded Processor. In 22nd Annual Computer Measurement Group Conference, December 1996
[28]
D. M. Tullsen, S. J. Eggers, and H. M. Levy. Simultaneous Multithreading: Maximizing On-Chip Parallelism. In 22nd International Symposium on Computer Architecture, June 1995
[29]
R. Uhlig, R. Rishtein, O. Gershon, I. Hirsh, and H. Wang. SoftSDV: A Presilicon Software Development Environment for the IA-64 Architecture. In Intel Technology Journal, Q4 1999
[30]
H. Wang, P. Wang, R. D. Weldon, S. Ettinger, H. Saito, M. Girkar, S. Liao, J. Shen. Speculative Precomputation: Exploring Use of Multithreading Technology for Latency. In Intel Technology Journal, Volume 6, Issue on Hyper-threading, February 2002
[31]
P. Wang, H. Wang, J. Collins, E. Grochowski, R. Kling, J. Shen. Memory Latency-tolerance Approaches for Itanium Processors: out-of-order execution vs. speculative precomputation. In Proceedings of the 8th IEEE High-Performance Computer Architecture, Cambridge, Massachusetts, February 2002
[32]
M. Weiser. Program Slicing. In IEEE Transactions on Software Engineering, 10(4), pp. 352-357, 1984
[33]
C. Zilles and G. Sohi. Understanding the Backward Slices of Performance Degrading Instructions. In 27th International Symposium on Computer Architecture, Vancouver, BC, Canada, May 2000
[34]
C. Zilles and G. Sohi. Execution-Based Prediction Using Speculative Slices. In 28th International Symposium on Computer Architecture, Goteborg, Sweden, July 2001

Cited By

View all
  • (2019)BootstrappingProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3297858.3304052(687-700)Online publication date: 4-Apr-2019
  • (2015)Self-contained, accurate precomputation prefetchingProceedings of the 48th International Symposium on Microarchitecture10.1145/2830772.2830816(153-165)Online publication date: 5-Dec-2015
  • (2014)Hardware/Software Helper Thread Prefetching on Heterogeneous Many CoresProceedings of the 2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing10.1109/SBAC-PAD.2014.39(214-221)Online publication date: 22-Oct-2014
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PLDI '02: Proceedings of the ACM SIGPLAN 2002 conference on Programming language design and implementation
June 2002
338 pages
ISBN:1581134630
DOI:10.1145/512529
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 May 2002

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. chaining speculative precomputation
  2. delay minimization
  3. dependence reduction
  4. long-range thread-based prefetching
  5. loop rotation
  6. pointer
  7. post-pass
  8. prediction
  9. scheduling
  10. slack
  11. slicing
  12. speculation
  13. triggering

Qualifiers

  • Article

Conference

PLDI02
Sponsor:

Acceptance Rates

PLDI '02 Paper Acceptance Rate 28 of 169 submissions, 17%;
Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2019)BootstrappingProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3297858.3304052(687-700)Online publication date: 4-Apr-2019
  • (2015)Self-contained, accurate precomputation prefetchingProceedings of the 48th International Symposium on Microarchitecture10.1145/2830772.2830816(153-165)Online publication date: 5-Dec-2015
  • (2014)Hardware/Software Helper Thread Prefetching on Heterogeneous Many CoresProceedings of the 2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing10.1109/SBAC-PAD.2014.39(214-221)Online publication date: 22-Oct-2014
  • (2014)Execution DraftingProceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2014.43(432-444)Online publication date: 13-Dec-2014
  • (2013)Load-balanced pipeline parallelismProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.1145/2503210.2503295(1-12)Online publication date: 17-Nov-2013
  • (2012)Coalition threadingProceedings of the 21st international conference on Parallel architectures and compilation techniques10.1145/2370816.2370857(273-282)Online publication date: 19-Sep-2012
  • (2012)Reducing Cache Pollution of Threaded Prefetching by Controlling Prefetch DistanceProceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum10.1109/IPDPSW.2012.224(1812-1819)Online publication date: 21-May-2012
  • (2011)Inter-core prefetching for multicore processors using migrating helper threadsACM SIGPLAN Notices10.1145/1961296.195041146:3(393-404)Online publication date: 5-Mar-2011
  • (2011)Inter-core prefetching for multicore processors using migrating helper threadsACM SIGARCH Computer Architecture News10.1145/1961295.195041139:1(393-404)Online publication date: 5-Mar-2011
  • (2011)Inter-core prefetching for multicore processors using migrating helper threadsProceedings of the sixteenth international conference on Architectural support for programming languages and operating systems10.1145/1950365.1950411(393-404)Online publication date: 5-Mar-2011
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media