[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3307650.3322207acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

Perceptron-based prefetch filtering

Published: 22 June 2019 Publication History

Abstract

Hardware prefetching is an effective technique for hiding cache miss latencies in modern processor designs. Prefetcher performance can be characterized by two main metrics that are generally at odds with one another: coverage, the fraction of baseline cache misses which the prefetcher brings into the cache; and accuracy, the fraction of prefetches which are ultimately used. An overly aggressive prefetcher may improve coverage at the cost of reduced accuracy. Thus, performance may be harmed by this over-aggressiveness because many resources are wasted, including cache capacity and bandwidth. An ideal prefetcher would have both high coverage and accuracy.
In this paper, we introduce Perceptron-based Prefetch Filtering (PPF) as a way to increase the coverage of the prefetches generated by an underlying prefetcher without negatively impacting accuracy. PPF enables more aggressive tuning of the underlying prefetcher, leading to increased coverage by filtering out the growing numbers of inaccurate prefetches such an aggressive tuning implies. We also explore a range of features to use to train PPF's perceptron layer to identify inaccurate prefetches. PPF improves performance on a memory-intensive subset of the SPEC CPU 2017 benchmarks by 3.78% for a single-core configuration, and by 11.4% for a 4-core configuration, compared to the underlying prefetcher alone.

References

[1]
W. A. Wulf and S. A. McKee, "Hitting the memory wall: Implications of the obvious," SIGARCH Comput. Archit. News, vol. 23, pp. 20--24, Mar. 1995.
[2]
J. Kim, S. H. Pugsley, P. V. Gratz, A. L. N. Reddy, C. Wilkerson, and Z. Chishti, "Path confidence based lookahead prefetching," in 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 1--12, Oct 2016.
[3]
N. D. E. Jerger, E. L. Hill, and M. H. Lipasti, "Friendly fire: understanding the effects of multiprocessor prefetches," in 2006 IEEE International Symposium on Performance Analysis of Systems and Software, pp. 177--188, March 2006.
[4]
L. Peled, S. Mannor, U. Weiser, and Y. Etsion, "Semantic locality and context-based prefetching using reinforcement learning," in 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA), pp. 285--297, June 2015.
[5]
S. Liao, T. Hung, D. Nguyen, C. Chou, C. Tu, and H. Zhou, "Machine learning-based prefetch optimization for data center applications," in Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, pp. 1--10, Nov 2009.
[6]
N. P. Jouppi, "Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers," in Proceedings of the 17th Annual International Symposium on Computer Architecture, ISCA '90, (New York, NY, USA), pp. 364--373, ACM, 1990.
[7]
A. J. Smith, "Sequential program prefetching in memory hierarchies," Computer, vol. 11, pp. 7--21, Dec. 1978.
[8]
J.-L. Baer and T.-F. Chen, "An effective on-chip preloading scheme to reduce data access penalty," in Proceedings of the 1991 ACM/IEEE Conference on Supercomputing, Supercomputing '91, (New York, NY, USA), pp. 176--186, ACM, 1991.
[9]
J.-L. Baer and T.-F. Chen, "Effective hardware-based data prefetching for high-performance processors," IEEE Trans. Comput., vol. 44, pp. 609--623, May 1995.
[10]
T. F. Wenisch, M. Ferdman, A. Ailamaki, B. Falsafi, and A. Moshovos, "Making address-correlated prefetching practical," IEEE Micro, vol. 30, pp. 50--59, Jan 2010.
[11]
Y. Ishii, M. Inaba, and K. Hiraki, "Access map pattern matching for data cache prefetch," in Proceedings of the 23rd International Conference on Supercomputing, ICS '09, (New York, NY, USA), pp. 499--500, ACM, 2009.
[12]
C. F. Chen, S. . Yang, B. Falsafi, and A. Moshovos, "Accurate and complexity-effective spatial pattern prediction," in 10th International Symposium on High Performance Computer Architecture (HPCA'04), pp. 276--287, Feb 2004.
[13]
S. Somogyi, T. F. Wenisch, A. Ailamaki, B. Falsafi, and A. Moshovos, "Spatial memory streaming," in Proceedings of the 33rd Annual International Symposium on Computer Architecture, ISCA '06, (Washington, DC, USA), pp. 252--263, IEEE Computer Society, 2006.
[14]
M. Ferdman, T. F. Wenisch, A. Ailamaki, B. Falsafi, and A. Moshovos, "Temporal instruction fetch streaming," in Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 41, (Washington, DC, USA), pp. 1--10, IEEE Computer Society, 2008.
[15]
T. F. Wenisch, M. Ferdman, A. Ailamaki, B. Falsafi, and A. Moshovos, "Practical off-chip meta-data for temporal memory streaming," in 2009 IEEE 15th International Symposium on High Performance Computer Architecture, pp. 79--90, Feb 2009.
[16]
S. Somogyi, T. F. Wenisch, A. Ailamaki, and B. Falsafi, "Spatio-temporal memory streaming," in Proceedings of the 36th Annual International Symposium on Computer Architecture, ISCA '09, (New York, NY, USA), pp. 69--80, ACM, 2009.
[17]
S. Somogyi, T. F. Wenisch, M. Ferdman, and B. Falsafi, "Spatial memory streaming," J. Instruction-Level Parallelism, vol. 13, 2011.
[18]
D. Kadjo, J. Kim, P. Sharma, R. Panda, P. Gratz, and D. Jiménez, "B-fetch: Branch prediction directed prefetching for chip-multiprocessors," in 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 623--634, Dec 2014.
[19]
L. M. AlBarakat, P. V. Gratz, and D. A. Jiménez, "Mtb-fetch: Multithreading aware hardware prefetching for chip multiprocessors," IEEE Computer Architecture Letters, vol. 17, pp. 175--178, July 2018.
[20]
D. A. Jiménez and C. Lin, "Dynamic branch prediction with perceptrons," in Proceedings of the 7th International Symposium on High Performance Computer Architecture (HPCA-7), pp. 197--206, 2001.
[21]
D. Tarjan and K. Skadron, "Merging path and gshare indexing in perceptron branch prediction," ACM Trans. Archit. Code Optim., vol. 2, pp. 280--300, Sept. 2005.
[22]
E. Teran, Z. Wang, and D. A. Jiménez, "Perceptron learning for reuse prediction," in The 49th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-49, (Piscataway, NJ, USA), pp. 2:1--2:12, IEEE Press, 2016.
[23]
D. A. Jiménez and E. Teran, "Multiperspective reuse prediction," in Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-50 '17, (New York, NY, USA), pp. 436--448, ACM, 2017.
[24]
J. A. Joao, O. Mutlu, C. J. Lee, R. Cohn, Y. N. Patt, and H. Kim, "Virtual program counter (vpc) prediction: Very low cost indirect branch prediction using conditional branch prediction hardware," IEEE Transactions on Computers, vol. 58, pp. 1153--1170, 12 2008.
[25]
"The champsim simulator." https://github.com/ChampSim/ChampSim.
[26]
S. H. Pugsley, A. R. Alameldeen, C. Wilkerson, and H. Kim, "The 2nd data prefetching championship (dpc-2)." http://comparch-conf.gatech.edu/dpc2/.
[27]
"The 2nd cache replacement championship (crc-2)."
[28]
"Standard performance evaluation corporation cpu2017 benchmark suite." http://www.spec.org/cpu2017/.
[29]
E. Perelman, G. Hamerly, M. Van Biesbrouck, T. Sherwood, and B. Calder, "Using simpoint for accurate and efficient simulation," in Proceedings of the 2003 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS '03, (New York, NY, USA), pp. 318--319, ACM, 2003.
[30]
"Standard performance evaluation corporation cpu2006 benchmark suite." http://www.spec.org/cpu2006/.
[31]
M. Ferdman, A. Adileh, O. Kocberber, S. Volos, M. Alisafaee, D. Jevdjic, C. Kaynak, A. D. Popescu, A. Ailamaki, and B. Falsafi, "Clearing the clouds: A study of emerging scale-out workloads on modern hardware," in Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVII, (New York, NY, USA), pp. 37--48, ACM, 2012.
[32]
Y. Ishii, M. Inaba, and K. Hiraki, "Unified memory optimizing architecture: Memory subsystem control with a unified predictor," in Proceedings of the 26th ACM International Conference on Supercomputing, ICS '12, (NY, USA), pp. 267--278, ACM, 2012.
[33]
S. H. Pugsley, Z. Chishti, C. Wilkerson, P. Chuang, R. L. Scott, A. Jaleel, S. Lu, K. Chow, and R. Balasubramonian, "Sandbox prefetching: Safe run-time evaluation of aggressive prefetchers," in 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA), pp. 626--637, Feb 2014.
[34]
P. Michaud, "Best-offset hardware prefetching," in 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 469--480, March 2016.
[35]
M. Shevgoor, S. Koladiya, R. Balasubramonian, C. Wilkerson, S. H. Pugsley, and Z. Chishti, "Efficiently prefetching complex address patterns," in 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 141--152, Dec 2015.
[36]
J. Kim, E. Teran, P. V. Gratz, D. A. Jiménez, S. H. Pugsley, and C. Wilkerson, "Kill the program counter: Reconstructing program behavior in the processor cache hierarchy," in Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '17, (New York, NY, USA), pp. 737--749, ACM, 2017.
[37]
C. Wu, A. Jaleel, M. Martonosi, S. C. Steely, and J. Emer, "Pacman: Prefetch-aware cache management for high performance caching," in 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 442--453, Dec 2011.
[38]
V. Seshadri, S. Yedkar, H. Xin, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry, "Mitigating prefetcher-caused pollution using informed caching policies for prefetched blocks," ACM Trans. Archit. Code Optim., vol. 11, pp. 51:1--51:22, Jan. 2015.
[39]
V. Seshadri, O. Mutlu, M. A. Kozuch, and T. C. Mowry, "The evicted-address filter: A unified mechanism to address both cache pollution and thrashing," in Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques, PACT '12, (New York, NY, USA), pp. 355--366, ACM, 2012.
[40]
A. Jain and C. Lin, "Rethinking belady's algorithm to accommodate prefetching," in 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), pp. 110--123, June 2018.
[41]
E. Ebrahimi, O. Mutlu, C. J. Lee, and Y. N. Patt, "Coordinated control of multiple prefetchers in multi-core systems," in Proceedings of the 42Nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 42, (New York, NY, USA), pp. 316--326, ACM, 2009.
[42]
M. Hashemi, K. Swersky, J. A. Smith, G. Ayers, H. Litz, J. Chang, C. Kozyrakis, and P. Ranganathan, "Learning memory access patterns," CoRR, vol. abs/1803.02329, 2018.
[43]
H. Wang and Z. Luo, "Data cache prefetching with perceptron learning," CoRR, vol. abs/1712.00905, 2017.
[44]
S. M. Khan, Y. Tian, and D. A. Jiménez, "Sampling dead block prediction for last-level caches," in Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO '43, (Washington, DC, USA), pp. 175--186, IEEE Computer Society, 2010.

Cited By

View all
  • (2025)Competitive cost-effective memory access predictor through short-term online SVM and dynamic vocabulariesFuture Generation Computer Systems10.1016/j.future.2024.107592164(107592)Online publication date: Mar-2025
  • (2024)Hyperion: A Highly Effective Page and PC Based Delta PrefetcherACM Transactions on Architecture and Code Optimization10.1145/367539821:4(1-27)Online publication date: 19-Nov-2024
  • (2024)Exploiting Vector Code Semantics for Efficient Data Cache PrefetchingProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656635(98-109)Online publication date: 30-May-2024
  • Show More Cited By

Index Terms

  1. Perceptron-based prefetch filtering
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Please enable JavaScript to view thecomments powered by Disqus.

            Information & Contributors

            Information

            Published In

            cover image ACM Conferences
            ISCA '19: Proceedings of the 46th International Symposium on Computer Architecture
            June 2019
            849 pages
            ISBN:9781450366694
            DOI:10.1145/3307650
            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Sponsors

            In-Cooperation

            • IEEE-CS\DATC: IEEE Computer Society

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            Published: 22 June 2019

            Permissions

            Request permissions for this article.

            Check for updates

            Qualifiers

            • Research-article

            Conference

            ISCA '19
            Sponsor:

            Acceptance Rates

            ISCA '19 Paper Acceptance Rate 62 of 365 submissions, 17%;
            Overall Acceptance Rate 543 of 3,203 submissions, 17%

            Upcoming Conference

            ISCA '25

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • Downloads (Last 12 months)189
            • Downloads (Last 6 weeks)35
            Reflects downloads up to 10 Dec 2024

            Other Metrics

            Citations

            Cited By

            View all
            • (2025)Competitive cost-effective memory access predictor through short-term online SVM and dynamic vocabulariesFuture Generation Computer Systems10.1016/j.future.2024.107592164(107592)Online publication date: Mar-2025
            • (2024)Hyperion: A Highly Effective Page and PC Based Delta PrefetcherACM Transactions on Architecture and Code Optimization10.1145/367539821:4(1-27)Online publication date: 19-Nov-2024
            • (2024)Exploiting Vector Code Semantics for Efficient Data Cache PrefetchingProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656635(98-109)Online publication date: 30-May-2024
            • (2024)Limoncello: Prefetchers for ScaleProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651373(577-590)Online publication date: 27-Apr-2024
            • (2024)Temporarily Unauthorized Stores: Write First, Ask for Permission Later2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00065(810-822)Online publication date: 2-Nov-2024
            • (2024)Triangel: A High-Performance, Accurate, Timely On-Chip Temporal Prefetcher2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00090(1202-1216)Online publication date: 29-Jun-2024
            • (2024)A New Formulation of Neural Data Prefetching2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00088(1173-1187)Online publication date: 29-Jun-2024
            • (2024)A Two Level Neural Approach Combining Off-Chip Prediction with Adaptive Prefetch Filtering2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00046(528-542)Online publication date: 2-Mar-2024
            • (2024)Rowhammer Cache: A Last-Level Cache for Low-Overhead Rowhammer Tracking2024 IEEE International Symposium on Hardware Oriented Security and Trust (HOST)10.1109/HOST55342.2024.10545410(349-360)Online publication date: 6-May-2024
            • (2024)Accelerating Graph Analytics Using Attention-Based Data PrefetcherSN Computer Science10.1007/s42979-024-02989-w5:5Online publication date: 13-Jun-2024
            • Show More Cited By

            View Options

            Login options

            View options

            PDF

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            Media

            Figures

            Other

            Tables

            Share

            Share

            Share this Publication link

            Share on social media