More Web Proxy on the site http://driver.im/

research-article

Path confidence based lookahead prefetching

Authors:

Seth H. Pugsley,

A. L. Narasimha Reddy,

Chris Wilkerson,

Zeshan ChishtiAuthors Info & Claims

MICRO-49: The 49th Annual IEEE/ACM International Symposium on Microarchitecture

Article No.: 60, Pages 1 - 12

Published: 15 October 2016 Publication History

Abstract

Designing prefetchers to maximize system performance often requires a delicate balance between coverage and accuracy. Achieving both high coverage and accuracy is particularly challenging in workloads with complex address patterns, which may require large amounts of history to accurately predict future addresses. This paper describes the Signature Path Prefetcher (SPP), which offers effective solutions for three classic challenges in prefetcher design. First, SPP uses a compressed history based scheme that accurately predicts complex address patterns. Second, unlike other history based algorithms, which miss out on many prefetching opportunities when address patterns make a transition between physical pages, SPP tracks complex patterns across physical page boundaries and continues prefetching as soon as they move to new pages. Finally, SPP uses the confidence it has in its predictions to adaptively throttle itself on a per-prefetch stream basis. In our analysis, we find that SPP improves performance by 27.2% over a no-prefetching baseline, and outperforms the state-of-the-art Best Offset prefetcher by 6.4%. SPP does this with minimal overhead, operating strictly in the physical address space, and without requiring any additional processor core state, such as the PC.

References

[1]

W. A. Wulf and S. A. McKee, "Hitting the memory wall: implications of the obvious," SIGARCH Comp. Arch. News, vol. 23, pp. 20--24, March 1995.

Digital Library

[2]

A. J. Smith, "Sequential program prefetching in memory hierarchies," Computer, vol. 11, pp. 7--21, December 1978.

Digital Library

[3]

T. Chen and J. Baer, "Effective hardware-based data prefetching for high-performance processors," IEEE Transactions on Computers, vol. 44, pp. 609--623, 1995.

Digital Library

[4]

S. Somogyi, T. F. Wenisch, A. Ailamaki, B. Falsafi, and A. Moshovos, "Spatial memory streaming," in ACM SIGARCH Computer Architecture News, vol. 34, pp. 252--263, IEEE Computer Society, 2006.

Digital Library

[5]

I. Hur and C. Lin, "Memory prefetching using adaptive stream detection," in Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 397--408, IEEE Computer Society, 2006.

Digital Library

[6]

S. Somogyi, T. F. Wenisch, A. Ailamaki, and B. Falsafi, "Spatio-temporal memory streaming," in ISCA, pp. 69--80, 2009.

Digital Library

[7]

A. Jain and C. Lin, "Linearizing irregular memory accesses for improved correlated prefetching.," in MICRO, pp. 247--259, 2013.

Digital Library

[8]

Y. Ishii, M. Inaba, and K. Hiraki, "Access map pattern matching for high performance data cache prefetch," Journal of Instruction-Level Parallelism, vol. 13, pp. 1--24, 2011.

[9]

M. Shevgoor, S. Koladiya, R. Balasubramonian, C. Wilkerson, S. H. Pugsley, and Z. Chishti, "Efficiently prefetching complex address patterns," in Proceedings of the 48th Annual IEEE/ACM International Symposium on Microarchitecture, 2015.

Digital Library

[10]

P. Michaud, "A best-offset prefetcher," in High Performance Computer Architecture (HPCA), 2016 IEEE 20th International Symposium on, IEEE, 2016.

[11]

J. W. Fu, J. H. Patel, and B. L. Janssens, "Stride directed prefetching in scalar processors," ACM SIGMICRO Newsletter, vol. 23, no. 1--2, pp. 102--110, 1992.

Digital Library

[12]

D. Kadjo, J. Kim, P. Sharma, R. Panda, P. Gratz, and D. Jimenez, "B-fetch: Branch prediction directed prefetching for chip-multiprocessors," in Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 623--634, IEEE Computer Society, 2014.

Digital Library

[13]

O. Mutlu, J. Stark, C. Wilkerson, and Y. N. Patt, "Runahead execution: An alternative to very large instruction windows for out-of-order processors," in Proceedings of the 9th International Symposium on High Performance Computer Architecture (HPCA), pp. 129--140, 2003.

Digital Library

[14]

K. J. Nesbit, A. S. Dhodapkar, and J. E. Smith, "Ac/dc: An adaptive data cache prefetcher," in Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, pp. 135--145, IEEE Computer Society, 2004.

Digital Library

[15]

S. H. Pugsley, Z. Chishti, C. Wilkerson, P.-f. Chuang, R. L. Scott, A. Jaleel, S.-L. Lu, K. Chow, and R. Balasubramonian, "Sandbox prefetching: Safe run-time evaluation of aggressive prefetchers," in High Performance Computer Architecture (HPCA), 2014 IEEE 20th International Symposium on, pp. 626--637, IEEE, 2014.

[16]

S. Srinath, O. Mutlu, H. Kim, and Y. N. Patt, "Feedback directed prefetching: Improving the performance and bandwidth-efficiency of hardware prefetchers," in High Performance Computer Architecture, 2007. HPCA 2007. IEEE 13th International Symposium on, pp. 63--74, IEEE, 2007.

Digital Library

[17]

K. J. Nesbit and J. E. Smith, "Data cache prefetching using a global history buffer," in Software, IEE Proceedings-, pp. 96--96, IEEE, 2004.

Digital Library

[18]

S. H. Pugsley, A. R. Alameldeen, C. Wilkerson, and H. Kim, "The 2nd Data Prefetching Championship (DPC-2)."

[19]

E. Perelman, G. Hamerly, M. Van Biesbrouck, T. Sherwood, and B. Calder, "Using simpoint for accurate and efficient simulation," in ACM SIGMETRICS Performance Evaluation Review, vol. 31, pp. 318--319, ACM, 2003.

Digital Library

[20]

"Standard Performance Evaluation Corporation CPU2006 Benchmark Suite."

[21]

H. Patil, C. Pereira, M. Stallcup, G. Lueck, and J. Cownie, "Pinplay: a framework for deterministic replay and reproducible analysis of parallel programs," in Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization, pp. 2--11, ACM, 2010.

Digital Library

[22]

M. Ferdman, A. Adileh, O. Kocberber, S. Volos, M. Alisafaee, D. Jevdjic, C. Kaynak, A. D. Popescu, A. Ailamaki, and B. Falsafi, "Clearing the clouds: a study of emerging scale-out workloads on modern hardware," in ACM SIGPLAN Notices, vol. 47, pp. 37--48, ACM, 2012.

Digital Library

[23]

Y. Ishii, M. Inaba, and K. Hiraki, "Unified memory optimizing architecture: memory subsystem control with a unified predictor," in Proceedings of the 26th ACM International Conference on Supercomputing, pp. 267--278, ACM, 2012.

Digital Library

[24]

N. D. Enright Jerger, E. L. Hill, and M. H. Lipasti, "Friendly fire: understanding the effects of multiprocessor prefetches," in International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 177--188, 2006.

Cited By

Tirumalasetty CAnnapareddy N(2024)Contention aware DRAM caching for CXL-enabled pooled memoryProceedings of the International Symposium on Memory Systems10.1145/3695794.3695808(157-171)Online publication date: 30-Sep-2024
https://dl.acm.org/doi/10.1145/3695794.3695808
Liu YChen MDe V(2024)Planaria: Pattern Directed Cross-page Composite PrefetcherProceedings of the 61st ACM/IEEE Design Automation Conference10.1145/3649329.3656499(1-6)Online publication date: 23-Jun-2024
https://dl.acm.org/doi/10.1145/3649329.3656499
Xue FHan CLi XWu JZhang TLiu THao YDu ZGuo QZhang F(2024)Tyche: An Efficient and General Prefetcher for Indirect Memory AccessesACM Transactions on Architecture and Code Optimization10.1145/3641853Online publication date: 22-Jan-2024
https://dl.acm.org/doi/10.1145/3641853
Show More Cited By

Recommendations

An improved lookahead instruction prefetching
HPC-ASIA '97: Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97

A new lookahead instruction prefetching mechanism is proposed in this paper. Though significant performance improvement can be obtained by improving both the cache miss ratio and average access time for successfully prefetched blocks, most conventional ...
Effective cache prefetching on bus-based multiprocessors

Compiler-directed cache prefetching has the potential to hide much of the high memory latency seen by current and future high-performance processors. However, prefetching is not without costs, particularly on a shared-memory multiprocessor. Prefetching ...
Stealth prefetching
Proceedings of the 2006 ASPLOS Conference

Prefetching in shared-memory multiprocessor systems is an increasingly difficult problem. As system designs grow to incorporate larger numbers of faster processors, memory latency and interconnect traffic increase. While aggressive prefetching ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MICRO-49: The 49th Annual IEEE/ACM International Symposium on Microarchitecture

October 2016

816 pages

General Chairs:
Wei-Chung Hsu
NTU, Taiwan
,
Chia-Lin Yang
NTU, Taiwan
,
Program Chairs:
Mikko Lipasti
Univ. Wisconsin
,
Hsien-Hsin Lee
TSMC, Taiwan

Sponsors

SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing
IEEE-CS\DATC: IEEE Computer Society

Publisher

IEEE Press

Publication History

Published: 15 October 2016

Check for updates

Qualifiers

Research-article

Conference

MICRO-49

Sponsor:

SIGMICRO
IEEE-CS\DATC

MICRO-49: The 49th Annual IEEE/ACM International Symposium on Microarchitecture

October 15 - 19, 2016

Taipei, Taiwan

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

20
Total Citations
View Citations
520
Total Downloads

Downloads (Last 12 months)198
Downloads (Last 6 weeks)26

Reflects downloads up to 10 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Tirumalasetty CAnnapareddy N(2024)Contention aware DRAM caching for CXL-enabled pooled memoryProceedings of the International Symposium on Memory Systems10.1145/3695794.3695808(157-171)Online publication date: 30-Sep-2024
https://dl.acm.org/doi/10.1145/3695794.3695808
Liu YChen MDe V(2024)Planaria: Pattern Directed Cross-page Composite PrefetcherProceedings of the 61st ACM/IEEE Design Automation Conference10.1145/3649329.3656499(1-6)Online publication date: 23-Jun-2024
https://dl.acm.org/doi/10.1145/3649329.3656499
Xue FHan CLi XWu JZhang TLiu THao YDu ZGuo QZhang F(2024)Tyche: An Efficient and General Prefetcher for Indirect Memory AccessesACM Transactions on Architecture and Code Optimization10.1145/3641853Online publication date: 22-Jan-2024
https://dl.acm.org/doi/10.1145/3641853
Guo YCao DXin XZhang YYang J(2023)Uncore Encore: Covert Channels Exploiting Uncore Frequency ScalingProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614259(843-855)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3614259
Jiang SYang QCi YHardavellas NCampanoni SGrot BKarpuzcu U(2022)Merging Similar Patterns for Hardware PrefetchingProceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO56248.2022.00071(1012-1026)Online publication date: 1-Oct-2022
https://dl.acm.org/doi/10.1109/MICRO56248.2022.00071
Kaushik APekhimenko GPatel H(2021)GretchACM Transactions on Architecture and Code Optimization10.1145/343980318:2(1-25)Online publication date: 9-Feb-2021
https://dl.acm.org/doi/10.1145/3439803
Ye CXu YShen XLiao XJin HSolihin YMartínez JDuato JJohn L(2021)Supporting legacy libraries on non-volatile memoryProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00042(443-455)Online publication date: 14-Jun-2021
https://dl.acm.org/doi/10.1109/ISCA52012.2021.00042
Naithani AAinsworth SJones TEeckhout LMartínez JDuato JJohn L(2021)Vector runaheadProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00024(195-208)Online publication date: 14-Jun-2021
https://dl.acm.org/doi/10.1109/ISCA52012.2021.00024
Ros AJimborean AMartínez JDuato JJohn L(2021)A cost-effective entangling prefetcher for instructionsProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00017(99-111)Online publication date: 14-Jun-2021
https://dl.acm.org/doi/10.1109/ISCA52012.2021.00017
Vavouliotis GAlvarez LKarakostas VNikas KKoziris NJiménez DCasas MMartínez JDuato JJohn L(2021)Exploiting page table locality for agile TLB prefetchingProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00016(85-98)Online publication date: 14-Jun-2021
https://dl.acm.org/doi/10.1109/ISCA52012.2021.00016
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents