[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1109/ISCA.2005.48acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article

Store Vulnerability Window (SVW): Re-Execution Filtering for Enhanced Load Optimization

Published: 01 May 2005 Publication History

Abstract

The load-store unit is a performance critical component of a dynamically-scheduled processor. It is also a complex and non-scalable component. Several recently proposed techniques use some form of speculation to simplify the load-store unit and check this speculation by re-executing some of the loads prior to commit. We call such techniques load optimizations. One recent load optimization improves load queue (LQ) scalability by using re-execution rather than associative search to check speculative intra- and inter- thread memory ordering. A second technique improves store queue (SQ) scalability by speculatively filtering some load accesses and some store entries from it and re-executing loads to check that speculation. A third technique speculatively removes redundant loads from the execution engine; re-execution detects false eliminations. Unfortunately, the benefits of a load optimization are often mitigated by re-execution itself. Re-execution contends for cache bandwidth with store commit, and serializes load re-execution with subsequent store commit. If a given load optimization requires a sufficient number of load re-executions, the aggregate re-execution cost may overwhelm the benefits of the technique entirely and even cause drastic slowdowns. Store Vulnerability Window (SVW) is a new mechanism that significantly reduces the re-execution requirements of a given load optimization. SVW is based on monotonic store sequence numbering and an adaptation of Bloom filtering. The cost of a typical SVW implementation is a 1KB buffer and a 16-bit field per LQ entry. Across the three optimizations we study, SVW reduces re-executions by an average of 85%. This reduction relieves cache port contention and removes many of the dynamic serialization events that contribute the bulk of re-executionýs cost, allows these load optimizations to perform up to their full potential. For the speculative SQ, this means the chance to perform at all, as without SVW it posts significant slowdowns.

References

[1]
{1} H. Akkary, R. Rajwar, and S. Srinivasan. "Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors." In MICRO-36, Dec. 2003.
[2]
{2} T. Austin. "DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design." In MICRO-32, Nov. 1999.
[3]
{3} L. Baugh and C. Zilles. "Decomposing the Load-Store Queue by Function for Power Reduction and Scalability." In IBM P=AC<2 Conference, Oct. 2004.
[4]
{4} B. Bloom. "Space/time tradeoffs in hash coding with allowable errors." CACM, 13(7):422-426, Jul. 1970.
[5]
{5} E. Borch, E. Tune, S. Manne, and J. Emer. "Loose Loops Sink Chips." In HPCA-8, Jan. 2002.
[6]
{6} H. Cain and M. Lipasti. "Memory Ordering: A Value Based Definition." In ISCA-31, Jun. 2004.
[7]
{7} G. Chrysos and J. Emer. "Memory Dependence Prediction using Store Sets." In ISCA-25, Jun. 1998.
[8]
{8} D. Gallagher, W. Chen, S. Mahlke, J. Gyllenhaal, and W. Hwu. "Dynamic Memory Disambiguation Using the Memory Conflict Buffer." In ASPLOS-6, Oct. 1994.
[9]
{9} K. Gharachorloo, A. Gupta, and J. Hennessy. "Two Techniques to Enhance the Performance of Memory Consistency Models." In ICPP, Aug. 1991.
[10]
{10} Intel Corporation. Pentium Pro Family Developer's Manual, 1996.
[11]
{11} Intel Corporation. IA-64 Application Developer's Architecture Guide, May 1999.
[12]
{12} S. Jourdan, R. Ronen, M. Bekerman, B. Shomar, and A. Yoaz. "A Novel Renaming Scheme to Exploit Value Temporal Locality Through Physical Register Reuse and Unification." In MICRO-31, Dec. 1998.
[13]
{13} K. Lepak and M. Lipasti. "On the Value Locality of Store Instructions." In ISCA-27, Jun. 2000.
[14]
{14} A. Moshovos, S. Breach, T. Vijaykumar, and G. Sohi. "Dynamic Speculation and Synchronization of Data Dependences." In ISCA-24, Jun. 1997.
[15]
{15} A. Moshovos and G. Sohi. "Streamlining Inter-Operation Communication via Data Dependence Prediction." In MICRO-30, Dec. 1997.
[16]
{16} S. Onder and R. Gupta. "Dynamic Memory Disambiguation in the Presence of Out-of-Order Store Issuing." In MICRO-32, Nov. 1999.
[17]
{17} S. Onder and R. Gupta. "Load and Store Reuse using Register File Contents." In ICS-15, Jun. 2001.
[18]
{18} I. Park, C. Ooi, and T. Vijaykumar. "Reducing Design Complexity of the Load/Store Queue." In MICRO-36, Dec. 2003.
[19]
{19} V. Petric, A. Bracy, and A. Roth. "Three Extensions to Register Integration." In MICRO-35, Nov. 2002.
[20]
{20} A. Roth. "A High Bandwidth Low Latency Load/Store Unit for Single- and Multi- Threaded Processors." Technical Report MS-CIS-04-09, University of Pennsylvania, Jun. 2004.
[21]
{21} A. Roth and G. Sohi. "Register Integration: A Simple and Efficent Implementation of Squash Re-Use." In MICRO-33 , Dec. 2000.
[22]
{22} S. Sethumadhavan, R. Desikan, D. Burger, C. Moore, and S. Keckler. "Scalable Hardware Memory Disambiguation for High ILP Processors." In MICRO-36, Dec. 2003.
[23]
{23} A. Sodani and G. Sohi. "Dynamic Instruction Reuse." In ISCA-24, Jun 1997.
[24]
{24} K. Yeager. "The MIPS R10000 Superscalar Microprocessor." IEEE Micro, Apr. 1996.
[25]
{25} A. Yoaz, M. Erez, R. Ronen, and S. Jourdan. "Speculation Techniques for Improving Load-Related Instruction Scheduling." In ISCA-26, May 1999.

Cited By

View all
  • (2023)Orinoco: Ordered Issue and Unordered Commit with Non-Collapsible QueuesProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589046(1-14)Online publication date: 17-Jun-2023
  • (2022)Register file prefetchingProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527398(410-423)Online publication date: 18-Jun-2022
  • (2021)Early Address PredictionACM Transactions on Architecture and Code Optimization10.1145/345888318:3(1-22)Online publication date: 8-Jun-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '05: Proceedings of the 32nd annual international symposium on Computer Architecture
June 2005
541 pages
ISBN:076952270X
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 33, Issue 2
    ISCA 2005
    May 2005
    531 pages
    ISSN:0163-5964
    DOI:10.1145/1080695
    Issue’s Table of Contents

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 May 2005

Check for updates

Qualifiers

  • Article

Conference

ISCA05
Sponsor:

Acceptance Rates

ISCA '05 Paper Acceptance Rate 45 of 194 submissions, 23%;
Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Orinoco: Ordered Issue and Unordered Commit with Non-Collapsible QueuesProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589046(1-14)Online publication date: 17-Jun-2023
  • (2022)Register file prefetchingProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527398(410-423)Online publication date: 18-Jun-2022
  • (2021)Early Address PredictionACM Transactions on Architecture and Code Optimization10.1145/345888318:3(1-22)Online publication date: 8-Jun-2021
  • (2018)The superfluous load queueProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00017(95-107)Online publication date: 20-Oct-2018
  • (2018)Hardware supported permission checks on persistent objects for performance and programmabilityProceedings of the 45th Annual International Symposium on Computer Architecture10.1109/ISCA.2018.00046(466-478)Online publication date: 2-Jun-2018
  • (2018)Dynamic memory dependence predicationProceedings of the 45th Annual International Symposium on Computer Architecture10.1109/ISCA.2018.00029(235-246)Online publication date: 2-Jun-2018
  • (2017)Non-Speculative Load-Load Reordering in TSOACM SIGARCH Computer Architecture News10.1145/3140659.308022045:2(187-200)Online publication date: 24-Jun-2017
  • (2017)Non-Speculative Load-Load Reordering in TSOProceedings of the 44th Annual International Symposium on Computer Architecture10.1145/3079856.3080220(187-200)Online publication date: 24-Jun-2017
  • (2010)FederationACM Transactions on Architecture and Code Optimization10.1145/1880043.18800467:4(1-38)Online publication date: 30-Dec-2010
  • (2009)Design and optimization of the store vectors memory dependence predictorACM Transactions on Architecture and Code Optimization10.1145/1596510.15965146:4(1-33)Online publication date: 29-Oct-2009
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media