[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/3195638.3195711acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

Path confidence based lookahead prefetching

Published: 15 October 2016 Publication History

Abstract

Designing prefetchers to maximize system performance often requires a delicate balance between coverage and accuracy. Achieving both high coverage and accuracy is particularly challenging in workloads with complex address patterns, which may require large amounts of history to accurately predict future addresses. This paper describes the Signature Path Prefetcher (SPP), which offers effective solutions for three classic challenges in prefetcher design. First, SPP uses a compressed history based scheme that accurately predicts complex address patterns. Second, unlike other history based algorithms, which miss out on many prefetching opportunities when address patterns make a transition between physical pages, SPP tracks complex patterns across physical page boundaries and continues prefetching as soon as they move to new pages. Finally, SPP uses the confidence it has in its predictions to adaptively throttle itself on a per-prefetch stream basis. In our analysis, we find that SPP improves performance by 27.2% over a no-prefetching baseline, and outperforms the state-of-the-art Best Offset prefetcher by 6.4%. SPP does this with minimal overhead, operating strictly in the physical address space, and without requiring any additional processor core state, such as the PC.

References

[1]
W. A. Wulf and S. A. McKee, "Hitting the memory wall: implications of the obvious," SIGARCH Comp. Arch. News, vol. 23, pp. 20--24, March 1995.
[2]
A. J. Smith, "Sequential program prefetching in memory hierarchies," Computer, vol. 11, pp. 7--21, December 1978.
[3]
T. Chen and J. Baer, "Effective hardware-based data prefetching for high-performance processors," IEEE Transactions on Computers, vol. 44, pp. 609--623, 1995.
[4]
S. Somogyi, T. F. Wenisch, A. Ailamaki, B. Falsafi, and A. Moshovos, "Spatial memory streaming," in ACM SIGARCH Computer Architecture News, vol. 34, pp. 252--263, IEEE Computer Society, 2006.
[5]
I. Hur and C. Lin, "Memory prefetching using adaptive stream detection," in Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 397--408, IEEE Computer Society, 2006.
[6]
S. Somogyi, T. F. Wenisch, A. Ailamaki, and B. Falsafi, "Spatio-temporal memory streaming," in ISCA, pp. 69--80, 2009.
[7]
A. Jain and C. Lin, "Linearizing irregular memory accesses for improved correlated prefetching.," in MICRO, pp. 247--259, 2013.
[8]
Y. Ishii, M. Inaba, and K. Hiraki, "Access map pattern matching for high performance data cache prefetch," Journal of Instruction-Level Parallelism, vol. 13, pp. 1--24, 2011.
[9]
M. Shevgoor, S. Koladiya, R. Balasubramonian, C. Wilkerson, S. H. Pugsley, and Z. Chishti, "Efficiently prefetching complex address patterns," in Proceedings of the 48th Annual IEEE/ACM International Symposium on Microarchitecture, 2015.
[10]
P. Michaud, "A best-offset prefetcher," in High Performance Computer Architecture (HPCA), 2016 IEEE 20th International Symposium on, IEEE, 2016.
[11]
J. W. Fu, J. H. Patel, and B. L. Janssens, "Stride directed prefetching in scalar processors," ACM SIGMICRO Newsletter, vol. 23, no. 1--2, pp. 102--110, 1992.
[12]
D. Kadjo, J. Kim, P. Sharma, R. Panda, P. Gratz, and D. Jimenez, "B-fetch: Branch prediction directed prefetching for chip-multiprocessors," in Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 623--634, IEEE Computer Society, 2014.
[13]
O. Mutlu, J. Stark, C. Wilkerson, and Y. N. Patt, "Runahead execution: An alternative to very large instruction windows for out-of-order processors," in Proceedings of the 9th International Symposium on High Performance Computer Architecture (HPCA), pp. 129--140, 2003.
[14]
K. J. Nesbit, A. S. Dhodapkar, and J. E. Smith, "Ac/dc: An adaptive data cache prefetcher," in Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, pp. 135--145, IEEE Computer Society, 2004.
[15]
S. H. Pugsley, Z. Chishti, C. Wilkerson, P.-f. Chuang, R. L. Scott, A. Jaleel, S.-L. Lu, K. Chow, and R. Balasubramonian, "Sandbox prefetching: Safe run-time evaluation of aggressive prefetchers," in High Performance Computer Architecture (HPCA), 2014 IEEE 20th International Symposium on, pp. 626--637, IEEE, 2014.
[16]
S. Srinath, O. Mutlu, H. Kim, and Y. N. Patt, "Feedback directed prefetching: Improving the performance and bandwidth-efficiency of hardware prefetchers," in High Performance Computer Architecture, 2007. HPCA 2007. IEEE 13th International Symposium on, pp. 63--74, IEEE, 2007.
[17]
K. J. Nesbit and J. E. Smith, "Data cache prefetching using a global history buffer," in Software, IEE Proceedings-, pp. 96--96, IEEE, 2004.
[18]
S. H. Pugsley, A. R. Alameldeen, C. Wilkerson, and H. Kim, "The 2nd Data Prefetching Championship (DPC-2)."
[19]
E. Perelman, G. Hamerly, M. Van Biesbrouck, T. Sherwood, and B. Calder, "Using simpoint for accurate and efficient simulation," in ACM SIGMETRICS Performance Evaluation Review, vol. 31, pp. 318--319, ACM, 2003.
[20]
"Standard Performance Evaluation Corporation CPU2006 Benchmark Suite."
[21]
H. Patil, C. Pereira, M. Stallcup, G. Lueck, and J. Cownie, "Pinplay: a framework for deterministic replay and reproducible analysis of parallel programs," in Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization, pp. 2--11, ACM, 2010.
[22]
M. Ferdman, A. Adileh, O. Kocberber, S. Volos, M. Alisafaee, D. Jevdjic, C. Kaynak, A. D. Popescu, A. Ailamaki, and B. Falsafi, "Clearing the clouds: a study of emerging scale-out workloads on modern hardware," in ACM SIGPLAN Notices, vol. 47, pp. 37--48, ACM, 2012.
[23]
Y. Ishii, M. Inaba, and K. Hiraki, "Unified memory optimizing architecture: memory subsystem control with a unified predictor," in Proceedings of the 26th ACM International Conference on Supercomputing, pp. 267--278, ACM, 2012.
[24]
N. D. Enright Jerger, E. L. Hill, and M. H. Lipasti, "Friendly fire: understanding the effects of multiprocessor prefetches," in International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 177--188, 2006.

Cited By

View all
  • (2024)Contention aware DRAM caching for CXL-enabled pooled memoryProceedings of the International Symposium on Memory Systems10.1145/3695794.3695808(157-171)Online publication date: 30-Sep-2024
  • (2024)Planaria: Pattern Directed Cross-page Composite PrefetcherProceedings of the 61st ACM/IEEE Design Automation Conference10.1145/3649329.3656499(1-6)Online publication date: 23-Jun-2024
  • (2024)Tyche: An Efficient and General Prefetcher for Indirect Memory AccessesACM Transactions on Architecture and Code Optimization10.1145/3641853Online publication date: 22-Jan-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO-49: The 49th Annual IEEE/ACM International Symposium on Microarchitecture
October 2016
816 pages

Sponsors

Publisher

IEEE Press

Publication History

Published: 15 October 2016

Check for updates

Qualifiers

  • Research-article

Conference

MICRO-49
Sponsor:

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)198
  • Downloads (Last 6 weeks)26
Reflects downloads up to 10 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Contention aware DRAM caching for CXL-enabled pooled memoryProceedings of the International Symposium on Memory Systems10.1145/3695794.3695808(157-171)Online publication date: 30-Sep-2024
  • (2024)Planaria: Pattern Directed Cross-page Composite PrefetcherProceedings of the 61st ACM/IEEE Design Automation Conference10.1145/3649329.3656499(1-6)Online publication date: 23-Jun-2024
  • (2024)Tyche: An Efficient and General Prefetcher for Indirect Memory AccessesACM Transactions on Architecture and Code Optimization10.1145/3641853Online publication date: 22-Jan-2024
  • (2023)Uncore Encore: Covert Channels Exploiting Uncore Frequency ScalingProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614259(843-855)Online publication date: 28-Oct-2023
  • (2022)Merging Similar Patterns for Hardware PrefetchingProceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO56248.2022.00071(1012-1026)Online publication date: 1-Oct-2022
  • (2021)GretchACM Transactions on Architecture and Code Optimization10.1145/343980318:2(1-25)Online publication date: 9-Feb-2021
  • (2021)Supporting legacy libraries on non-volatile memoryProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00042(443-455)Online publication date: 14-Jun-2021
  • (2021)Vector runaheadProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00024(195-208)Online publication date: 14-Jun-2021
  • (2021)A cost-effective entangling prefetcher for instructionsProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00017(99-111)Online publication date: 14-Jun-2021
  • (2021)Exploiting page table locality for agile TLB prefetchingProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00016(85-98)Online publication date: 14-Jun-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media