[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1109/MICRO.2005.23acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
Article

Incremental Commit Groups for Non-Atomic Trace Processing

Published: 12 November 2005 Publication History

Abstract

We introduce techniques to support efficient non-atomic execution of very long traces on a new binary translation based, x86-64 compatible VLIW microprocessor. Incrementally committed long traces significantly reduce wasted computations on exception induced rollbacks by retaining the correctly committed parts of traces. We divide each scheduled trace into multiple commit groups; groups are committed to the architectural state after all instructions within and prior to each group complete without exceptions. Architectural state updates are only visible after future commit points are deferred using a simple hardware commit buffer. We employ a commit depth predictor to predict how many groups a trace will complete, thereby eliminating pipeline flushes on repeated rollbacks. Unlike atomic traces, we allow instructions to be freely scheduled across commit points throughout the trace to maximize ILP. Commit groups are formed after scheduling, allowing the commit points terminating each group to be inserted more optimally. Commit groups promote significantly faster convergence on optimized traces, since we salvage partially executed traces and splice the working parts together into new optimized traces. We use detailed models to demonstrate how commit groups substantially improve performance (on average, over 1.5× on SPEC 2000) relative to atomic traces.

References

[1]
{1} M. Wing et al. Method and apparatus for aliasing memory data in an advanced microprocessor. U.S. Pat. 5926832, filed 26 Sep 1996. Transmeta Corp.
[2]
{2} E. Kelly et al. Host microprocessor with apparatus for temporarily holding target processor state. U.S. Pat. 5958061, filed 24 Jul 1996. Transmeta Corp.
[3]
{3} M. Wing et al. Gated store buffer for an advanced microprocessor. U.S. Pat. 6011908, filed 23 Dec 1996. Transmeta Corp.
[4]
{4} L. Torvalds et al. Method for translating instructions in a speculative microprocessor featuring committing state. U.S. Pat. 6871342, filed 13 Oct 1999, issued 22 Mar 2005 (withdrawn). Transmeta Corp.
[5]
{5} B. Coon et al. Use of enable bits to control execution of selected instructions U.S. Patent 6738892, filed 20 Oct 1999, issued 18 May 2004. Assn. Transmeta Corp.
[6]
{6} A. Klaiber. The Technology Behind Crusoe Processors. Transmeta Technical Report, January 2000.
[7]
{7} J. Dehnert et al. The Transmeta Code Morphing Software: Using Speculation, Recovery, and Adaptive Retranslation to Address Real-Life Challenges. CGO 2003.
[8]
{8} K. Krewell. Transmeta gets more Efficeon: Transmeta delivers new core, code-morphing software. Microprocessor Report, 1-Oct-2003.
[9]
{9} E. Altman, K. Ebcioglu. DAISY Dynamic Binary Translation Software . Software Manual for DAISY Open Source Release, 2000.
[10]
{10} E. Altman and M. Gschwind. BOA: A Second Generation DAISY Architecture. ISCA 2004.
[11]
{11} B. Fahs et al. The Performance Potential of Trace-based Dynamic Optimization. University of Illinois Technical Report, UILU-ENG- 04-2208, Nov 2004.
[12]
{12} G. Hinton et al. The Microarchitecture of the Pentium 4 Processor. Intel Tech Journal, Vol. Q1, 2001.
[13]
{13} G. Tyson, T. Austin. Improving the Accuracy and Performance of Memory Communication Through Renaming. Proc. MICRO 1997.
[14]
{14} G. Chrysos, J. Emer, Memory Dependence Prediction using Store Sets, ISCA 1998.
[15]
{15} A. Moshovos and G. Sohi. Speculative Memory Cloaking and Bypassing . Intl. J. Parallel Proc. 27(6) 1999, p. 427-456.
[16]
{16} M. Smith et al. Efficient Superscalar Performance Through Boosting . ALPLOS 1992.
[17]
{17} B. Rau. Dynamically Scheduled VLIW Processors. MICRO 1993.
[18]
{18} E. Ozer et al. A Fast Interrupt Handling Scheme for VLIW Processors . PACT 1998.
[19]
{19} H. C. Torng, Martin Day. Interrupt handling for out-of-order execution processors. IEEE Trans. on Comp., Jan 1993.
[20]
{20} K. Rudd. VLIW Processors: Eficiently Exploiting Instruction Level Parallelism. PhD Dissertation, Dept of Elec Eng., Stanford Univ. 1999.
[21]
{21} J. Martinez et.al. Cherry: Checkpointed Early Resource Recycling in Out-of-Order Microprocessors, MICRO 2002.
[22]
{22} H. Akkary et al. Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors, MICRO 2003.
[23]
{23} A. Cristal et al. Out Of Order Commit Processors, HPCA 2004.
[24]
{24} O. Ergin et al. Increasing Processor Performance Through Early Register Release. Proc. ICCD 2004.
[25]
{25} H. Chen et al. Profile-based optimizations: Dynamic trace selection using performance monitoring hardware sampling. CGO 2003.
[26]
{26} S. Patel et al. Increasing the Size of Atomic Instruction Blocks Using Control Flow Assertions. MICRO 2000.
[27]
{27} Q. Jacobson et al. Path-Based Next Trace Prediction, MICRO 1997.
[28]
{28} J. Smith et al. Implementing Precise Interrupts in Pipelined Processors . IEEE Trans. Comp. Vol 37 Issue 5, 1998.

Cited By

View all
  • (2013)Discerning the dominant out-of-order performance advantageACM SIGPLAN Notices10.1145/2499368.245114348:4(241-252)Online publication date: 16-Mar-2013
  • (2013)Discerning the dominant out-of-order performance advantageACM SIGARCH Computer Architecture News10.1145/2490301.245114341:1(241-252)Online publication date: 16-Mar-2013
  • (2013)Discerning the dominant out-of-order performance advantageProceedings of the eighteenth international conference on Architectural support for programming languages and operating systems10.1145/2451116.2451143(241-252)Online publication date: 16-Mar-2013
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO 38: Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
November 2005
350 pages
ISBN:0769524400

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 12 November 2005

Check for updates

Author Tags

  1. VLIW
  2. binary translation
  3. commitment
  4. trace prediction

Qualifiers

  • Article

Conference

Micro-38
Sponsor:

Acceptance Rates

MICRO 38 Paper Acceptance Rate 29 of 147 submissions, 20%;
Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2013)Discerning the dominant out-of-order performance advantageACM SIGPLAN Notices10.1145/2499368.245114348:4(241-252)Online publication date: 16-Mar-2013
  • (2013)Discerning the dominant out-of-order performance advantageACM SIGARCH Computer Architecture News10.1145/2490301.245114341:1(241-252)Online publication date: 16-Mar-2013
  • (2013)Discerning the dominant out-of-order performance advantageProceedings of the eighteenth international conference on Architectural support for programming languages and operating systems10.1145/2451116.2451143(241-252)Online publication date: 16-Mar-2013
  • (2009)An energy-efficient checkpointing mechanism for out of order commit processorProceedings of the 2009 ACM/IEEE international symposium on Low power electronics and design10.1145/1594233.1594279(183-188)Online publication date: 19-Aug-2009

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media