[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1109/HPCA.2007.346185guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers

Published: 10 February 2007 Publication History

Abstract

High performance processors employ hardware data prefetching to reduce the negative performance impact of large main memory latencies. While prefetching improves performance substantially on many programs, it can significantly reduce performance on others. Also, prefetching can significantly increase memory bandwidth requirements. This paper proposes a mechanism that incorporates dynamic feedback into the design of the prefetcher to increase the performance improvement provided by prefetching as well as to reduce the negative performance and bandwidth impact of prefetching. Our mechanism estimates prefetcher accuracy, prefetcher timeliness, and prefetcher-caused cache pollution to adjust the aggressiveness of the data prefetcher dynamically. We introduce a new method to track cache pollution caused by the prefetcher at run-time. We also introduce a mechanism that dynamically decides where in the LRU stack to insert the prefetched blocks in the cache based on the cache pollution caused by the prefetcher. Using the proposed dynamic mechanism improves average performance by 6.5% on 17 memory-intensive benchmarks in the SPEC CPU2000 suite compared to the best-performing conventional stream-based data prefetcher configuration, while it consumes 18.7% less memory bandwidth. Compared to a conventional stream-based data prefetcher configuration that consumes similar amount of memory bandwidth, feedback directed prefetching provides 13.6% higher performance. Our results show that feedback-directed prefetching eliminates the large negative performance impact incurred on some benchmarks due to prefetching, and it is applicable to stream-based prefetchers, global-history-buffer based delta correlation prefetchers, and PC-based stride prefetchers.

Cited By

View all
  • (2024)Contention aware DRAM caching for CXL-enabled pooled memoryProceedings of the International Symposium on Memory Systems10.1145/3695794.3695808(157-171)Online publication date: 30-Sep-2024
  • (2024)Data Prefetching on Processors with Heterogeneous MemoryProceedings of the International Symposium on Memory Systems10.1145/3695794.3695800(45-60)Online publication date: 30-Sep-2024
  • (2024)Limoncello: Prefetchers for ScaleProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651373(577-590)Online publication date: 27-Apr-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
HPCA '07: Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
February 2007
338 pages
ISBN:1424408040

Publisher

IEEE Computer Society

United States

Publication History

Published: 10 February 2007

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Contention aware DRAM caching for CXL-enabled pooled memoryProceedings of the International Symposium on Memory Systems10.1145/3695794.3695808(157-171)Online publication date: 30-Sep-2024
  • (2024)Data Prefetching on Processors with Heterogeneous MemoryProceedings of the International Symposium on Memory Systems10.1145/3695794.3695800(45-60)Online publication date: 30-Sep-2024
  • (2024)Limoncello: Prefetchers for ScaleProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651373(577-590)Online publication date: 27-Apr-2024
  • (2024)RL-CoPref: a reinforcement learning-based coordinated prefetching controller for multiple prefetchersThe Journal of Supercomputing10.1007/s11227-024-05938-980:9(13001-13026)Online publication date: 1-Jun-2024
  • (2023)Clockhands: Rename-free Instruction Set Architecture for Out-of-order ProcessorsProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614272(1-16)Online publication date: 28-Oct-2023
  • (2023)CLIP: Load Criticality based Data Prefetching for Bandwidth-constrained Many-core SystemsProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614245(714-727)Online publication date: 28-Oct-2023
  • (2022)T-SKIDProceedings of the 2022 Conference & Exhibition on Design, Automation & Test in Europe10.5555/3539845.3540168(1389-1394)Online publication date: 14-Mar-2022
  • (2022)Puppeteer: A Random Forest Based Manager for Hardware Prefetchers Across the Memory HierarchyACM Transactions on Architecture and Code Optimization10.1145/357030420:1(1-25)Online publication date: 16-Dec-2022
  • (2022)MetaSys: A Practical Open-source Metadata Management System to Implement and Evaluate Cross-layer OptimizationsACM Transactions on Architecture and Code Optimization10.1145/350525019:2(1-29)Online publication date: 24-Mar-2022
  • (2021)Matryoshka: A Coalesced Delta Sequence PrefetcherProceedings of the 50th International Conference on Parallel Processing10.1145/3472456.3473510(1-11)Online publication date: 9-Aug-2021
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media