More Web Proxy on the site http://driver.im/

Article

Incremental Commit Groups for Non-Atomic Trace Processing

Authors:

Matt T. Yourst,

Kanad GhoseAuthors Info & Claims

MICRO 38: Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture

Pages 67 - 80

https://doi.org/10.1109/MICRO.2005.23

Published: 12 November 2005 Publication History

Publisher Site Get Access

Abstract

We introduce techniques to support efficient non-atomic execution of very long traces on a new binary translation based, x86-64 compatible VLIW microprocessor. Incrementally committed long traces significantly reduce wasted computations on exception induced rollbacks by retaining the correctly committed parts of traces. We divide each scheduled trace into multiple commit groups; groups are committed to the architectural state after all instructions within and prior to each group complete without exceptions. Architectural state updates are only visible after future commit points are deferred using a simple hardware commit buffer. We employ a commit depth predictor to predict how many groups a trace will complete, thereby eliminating pipeline flushes on repeated rollbacks. Unlike atomic traces, we allow instructions to be freely scheduled across commit points throughout the trace to maximize ILP. Commit groups are formed after scheduling, allowing the commit points terminating each group to be inserted more optimally. Commit groups promote significantly faster convergence on optimized traces, since we salvage partially executed traces and splice the working parts together into new optimized traces. We use detailed models to demonstrate how commit groups substantially improve performance (on average, over 1.5× on SPEC 2000) relative to atomic traces.

References

[1]

{1} M. Wing et al. Method and apparatus for aliasing memory data in an advanced microprocessor. U.S. Pat. 5926832, filed 26 Sep 1996. Transmeta Corp.

[2]

{2} E. Kelly et al. Host microprocessor with apparatus for temporarily holding target processor state. U.S. Pat. 5958061, filed 24 Jul 1996. Transmeta Corp.

[3]

{3} M. Wing et al. Gated store buffer for an advanced microprocessor. U.S. Pat. 6011908, filed 23 Dec 1996. Transmeta Corp.

[4]

{4} L. Torvalds et al. Method for translating instructions in a speculative microprocessor featuring committing state. U.S. Pat. 6871342, filed 13 Oct 1999, issued 22 Mar 2005 (withdrawn). Transmeta Corp.

[5]

{5} B. Coon et al. Use of enable bits to control execution of selected instructions U.S. Patent 6738892, filed 20 Oct 1999, issued 18 May 2004. Assn. Transmeta Corp.

[6]

{6} A. Klaiber. The Technology Behind Crusoe Processors. Transmeta Technical Report, January 2000.

[7]

{7} J. Dehnert et al. The Transmeta Code Morphing Software: Using Speculation, Recovery, and Adaptive Retranslation to Address Real-Life Challenges. CGO 2003.

Digital Library

[8]

{8} K. Krewell. Transmeta gets more Efficeon: Transmeta delivers new core, code-morphing software. Microprocessor Report, 1-Oct-2003.

[9]

{9} E. Altman, K. Ebcioglu. DAISY Dynamic Binary Translation Software . Software Manual for DAISY Open Source Release, 2000.

[10]

{10} E. Altman and M. Gschwind. BOA: A Second Generation DAISY Architecture. ISCA 2004.

[11]

{11} B. Fahs et al. The Performance Potential of Trace-based Dynamic Optimization. University of Illinois Technical Report, UILU-ENG- 04-2208, Nov 2004.

[12]

{12} G. Hinton et al. The Microarchitecture of the Pentium 4 Processor. Intel Tech Journal, Vol. Q1, 2001.

[13]

{13} G. Tyson, T. Austin. Improving the Accuracy and Performance of Memory Communication Through Renaming. Proc. MICRO 1997.

Digital Library

[14]

{14} G. Chrysos, J. Emer, Memory Dependence Prediction using Store Sets, ISCA 1998.

Digital Library

[15]

{15} A. Moshovos and G. Sohi. Speculative Memory Cloaking and Bypassing . Intl. J. Parallel Proc. 27(6) 1999, p. 427-456.

Digital Library

[16]

{16} M. Smith et al. Efficient Superscalar Performance Through Boosting . ALPLOS 1992.

Digital Library

[17]

{17} B. Rau. Dynamically Scheduled VLIW Processors. MICRO 1993.

Digital Library

[18]

{18} E. Ozer et al. A Fast Interrupt Handling Scheme for VLIW Processors . PACT 1998.

Digital Library

[19]

{19} H. C. Torng, Martin Day. Interrupt handling for out-of-order execution processors. IEEE Trans. on Comp., Jan 1993.

Digital Library

[20]

{20} K. Rudd. VLIW Processors: Eficiently Exploiting Instruction Level Parallelism. PhD Dissertation, Dept of Elec Eng., Stanford Univ. 1999.

Digital Library

[21]

{21} J. Martinez et.al. Cherry: Checkpointed Early Resource Recycling in Out-of-Order Microprocessors, MICRO 2002.

Digital Library

[22]

{22} H. Akkary et al. Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors, MICRO 2003.

Digital Library

[23]

{23} A. Cristal et al. Out Of Order Commit Processors, HPCA 2004.

Digital Library

[24]

{24} O. Ergin et al. Increasing Processor Performance Through Early Register Release. Proc. ICCD 2004.

Digital Library

[25]

{25} H. Chen et al. Profile-based optimizations: Dynamic trace selection using performance monitoring hardware sampling. CGO 2003.

Digital Library

[26]

{26} S. Patel et al. Increasing the Size of Atomic Instruction Blocks Using Control Flow Assertions. MICRO 2000.

Digital Library

[27]

{27} Q. Jacobson et al. Path-Based Next Trace Prediction, MICRO 1997.

Digital Library

[28]

{28} J. Smith et al. Implementing Precise Interrupts in Pipelined Processors . IEEE Trans. Comp. Vol 37 Issue 5, 1998.

Digital Library

Cited By

McFarlin DTucker CZilles C(2013)Discerning the dominant out-of-order performance advantageACM SIGPLAN Notices10.1145/2499368.245114348:4(241-252)Online publication date: 16-Mar-2013
https://dl.acm.org/doi/10.1145/2499368.2451143
McFarlin DTucker CZilles C(2013)Discerning the dominant out-of-order performance advantageACM SIGARCH Computer Architecture News10.1145/2490301.245114341:1(241-252)Online publication date: 16-Mar-2013
https://dl.acm.org/doi/10.1145/2490301.2451143
McFarlin DTucker CZilles CSarkar VBodik R(2013)Discerning the dominant out-of-order performance advantageProceedings of the eighteenth international conference on Architectural support for programming languages and operating systems10.1145/2451116.2451143(241-252)Online publication date: 16-Mar-2013
https://dl.acm.org/doi/10.1145/2451116.2451143
Show More Cited By

Index Terms

Incremental Commit Groups for Non-Atomic Trace Processing
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Very long instruction word
    2. Serial architectures
      1. Complex instruction set computing
      2. Reduced instruction set computing
2. Hardware
  1. Emerging technologies

Recommendations

Fast non-blocking atomic commit: an inherent trade-off

This paper investigates the time-complexity of the non-blocking atomic commit (NBAC) problem in a synchronous distributed model where t out of n processes may fail by crashing. We exhibit for t ≥ 3 an inherent trade-off between the fast abort property ...
Dynamic instruction scheduling in a trace-based multi-threaded architecture

Simulation results are presented using the hardware-implemented, trace-based dynamic instruction scheduler of our single process DTSVLIW architecture to schedule instructions from several processes into multiple streams of VLIW instructions for ...
Author retrospective for software trace cache
ACM International Conference on Supercomputing 25th Anniversary Volume

In superscalar processors, capable of issuing and executing multiple instructions per cycle, fetch performance represents an upper bound to the overall processor performance. Unless there is some form of instruction re-use mechanism, you cannot execute ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MICRO 38: Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture

November 2005

350 pages

ISBN:0769524400

Sponsors

SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing

Publisher

IEEE Computer Society

United States

Publication History

Published: 12 November 2005

Check for updates

Author Tags

Qualifiers

Article

Conference

Micro-38

Sponsor:

SIGMICRO

Micro-38: The 38th Annual IEEE/ACM International Symposium on Microarchitecture

November 12 - 16, 2005

Barcelona, Spain

Acceptance Rates

MICRO 38 Paper Acceptance Rate 29 of 147 submissions, 20%;

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
17
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

McFarlin DTucker CZilles C(2013)Discerning the dominant out-of-order performance advantageACM SIGPLAN Notices10.1145/2499368.245114348:4(241-252)Online publication date: 16-Mar-2013
https://dl.acm.org/doi/10.1145/2499368.2451143
McFarlin DTucker CZilles C(2013)Discerning the dominant out-of-order performance advantageACM SIGARCH Computer Architecture News10.1145/2490301.245114341:1(241-252)Online publication date: 16-Mar-2013
https://dl.acm.org/doi/10.1145/2490301.2451143
McFarlin DTucker CZilles CSarkar VBodik R(2013)Discerning the dominant out-of-order performance advantageProceedings of the eighteenth international conference on Architectural support for programming languages and operating systems10.1145/2451116.2451143(241-252)Online publication date: 16-Mar-2013
https://dl.acm.org/doi/10.1145/2451116.2451143
Zeng HYourst MGhose KHenkel JKeshavarzi AChang NGhani T(2009)An energy-efficient checkpointing mechanism for out of order commit processorProceedings of the 2009 ACM/IEEE international symposium on Low power electronics and design10.1145/1594233.1594279(183-188)Online publication date: 19-Aug-2009
https://dl.acm.org/doi/10.1145/1594233.1594279

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents