[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Open access

ABS: A low-cost adaptive controller for prefetching in a banked shared last-level cache

Published: 26 January 2012 Publication History

Abstract

Hardware data prefetch is a very well known technique for hiding memory latencies. However, in a multicore system fitted with a shared Last-Level Cache (LLC), prefetch induced by a core consumes common resources such as shared cache space and main memory bandwidth. This may degrade the performance of other cores and even the overall system performance unless the prefetch aggressiveness of each core is controlled from a system standpoint. On the other hand, LLCs in commercial chip multiprocessors are more and more frequently organized in independent banks. In this contribution, we target for the first time prefetch in a banked LLC organization and propose ABS, a low-cost controller with a hill-climbing approach that runs stand-alone at each LLC bank without requiring inter-bank communication. Using multiprogrammed SPEC2K6 workloads, our analysis shows that the mechanism improves both user-oriented metrics (Harmonic Mean of Speedups by 27% and Fairness by 11%) and system-oriented metrics (Weighted Speedup increases 22% and Memory Bandwidth Consumption decreases 14%) over an eight-core baseline system that uses aggressive sequential prefetch with a fixed degree. Similar conclusions can be drawn by varying the number of cores or the LLC size, when running parallel applications, or when other prefetch engines are controlled.

References

[1]
Bienia, C. 2011. Benchmarking modern multiprocessors. Ph.D. thesis, Princeton University.
[2]
Cantin, J. F., Lipasti, M., and Smith, J. E. 2006. Stealth prefetching. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems. ASPLOS-XII.
[3]
Cho, S. and Jin, L. 2006. Managing distributed, shared l2 caches through os-level page allocation. In Proceedings of the 39th International Symposium on Microarchitecture.
[4]
Conway, P., Kalyanasundharam, N., Donley, G., Lepak, K., and Hughes, B. 2010. Cache hierarchy and memory subsystem of the amd opteron processor. IEEE Micro 30, 16--29.
[5]
Dahlgren, F., Dubois, M., and Stenstrom, P. 1993. Fixed and adaptive sequential prefetching in shared memory multiprocessors. In Proceedings of the 22nd International Conference on Parallel Processing.
[6]
Ebrahimi, E., Mutlu, O., Lee, C. J., and Patt, Y. N. 2009. Coordinated control of multiple prefetchers in multi-core systems. In Proceedings of the 42th Annual International Symposium on Microarchitecture.
[7]
Eyerman, S. and Eeckhout, L. 2008. System-level performance metrics for multiprogram workloads. IEEE Micro 28, 42--53.
[8]
Hennessy, J. and Patterson, D. 2007. Computer Architecture: A Quantitative Approach. Morgan Kaufmann.
[9]
Intel. 2011. Intel 64 and IA-32 Architectures Optimization Reference Manual.
[10]
Kongetira, P., Aingaran, K., and Olukotun, K. 2005. Niagara: a 32-way multithreaded sparc processor. IEEE Micro 25, 21--29.
[11]
Koppelman, D. M. 2000. Neighborhood prefetching on multiprocessors using instruction history. In Proceedings of the 9th International Conference on Parallel Architectures and Compilation Techniques.
[12]
Kottapalli, S. and Baxter, J. 2009. Nehalem-ex cpu architecture. In Hot Chips.
[13]
Le, H. Q., Starke, W. J., Fields, J. S., O'Connell, F. P., Nguyen, D. Q., Ronchetti, B. J., Sauer, W. M., Schwarz, E. M., and Vaden, M. T. 2007. IBM power6 microarchitecture. IBM J. Rese. Devel. 51, 639--662.
[14]
Luo, K., Gummaraju, J., and Franklin, M. 2001. Balancing thoughput and fairness in smt processors. In Proceedings of the International Symposium on Performance Analysis of Systems and Software.
[15]
Magnusson, P. S., Christensson, M., Eskilson, J., Forsgren, D., Hallberg, G., Hogberg, J., Larsson, F., Moestedt, A., and Werner, B. 2002. Simics: A full system simulation platform. Computer 35, 50--58.
[16]
Martin, M., Sorin, D. J., Beckmann, B. M., Marty, M., Xu, M., Alameldeen, A., K., M., Hill, M., and Wood, D. 2005. Multifacets general execution-driven multiprocessor simulator (gems) toolset. SIGARCH Comput. Architect. News 33, 2005.
[17]
Mutlu, O. and Moscibroda, T. 2007. Stall-time fair memory access scheduling for chip multiprocessors. In Proceedings of the 40th International Symposium on Microarchitecture.
[18]
Nesbit, K. J. and Smith, J. E. 2005. Data cache prefetching using a global history buffer. IEEE Micro 25, 90--97.
[19]
Palacharla, S. and Kessler, R. E. 1994. Evaluating stream buffers as a secondary cache replacement. In Proceedings of the 21st International Symposium on Computer Architecture.
[20]
Ramos, L. M., Briz, J., Ibáñez, P. E., and Viñals, V. 2011. Multi-level adaptive prefetching based on performance gradient tracking. J. Instruction-Level Paral. 13, 1--14.
[21]
Smith, A. J. 1982. Cache memories. ACM Comput. Surv. 14, 473--530.
[22]
Snavely, A. and Tullsen, D. M. 2000. Symbiotic jobscheduling for a simultaneous multithreaded processor. SIGARCH Comput. Architec. News 28, 234--244.
[23]
Somogyi, S., Wenisch, T. F., Ailamaki, A., and Falsafi, B. 2009. Spatio-temporal memory streaming. In Proceedings of the 36th Annual International Symposium on Computer Architecture.
[24]
Srinath, S., Mutlu, O., Kim, H., and Patt, Y. N. 2007. Feedback directed prefetching: Improving the performance and bandwidth-efficiency of hardware prefetchers. In Proceedings of the 13rd International Symposium on High Performance Computer Architecture.
[25]
Tcheun, M., Yoon, H., and Maeng, S. R. 1997. An adaptive sequential prefetching scheme in shared-memory multiprocessors. In Proceedings of the 26th International Conference on Parallel Processing.
[26]
Wallin, D. and Hagersten, E. 2003. Miss penalty reduction using bundled capacity prefetching in multiprocessors. In Proceedings of the 17th International Parallel and Distributed Processing Symposium.
[27]
Woo, S. C., Ohara, M., Torrie, E., Singh, J. P., and Gupta, A. 1995. The splash-2 programs: characterization and methodological considerations. In Proceedings of the 22nd International Symposium on Computer Architecture.

Cited By

View all
  • (2024)Hyperion: A Highly Effective Page and PC Based Delta PrefetcherACM Transactions on Architecture and Code Optimization10.1145/367539821:4(1-27)Online publication date: 1-Jul-2024
  • (2022)Berti: An Accurate Local-Delta Data PrefetcherProceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO56248.2022.00072(975-991)Online publication date: 1-Oct-2022
  • (2019)Combining Prefetch Control and Cache Partitioning to Improve Multicore Performance2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2019.00103(953-962)Online publication date: May-2019
  • Show More Cited By

Index Terms

  1. ABS: A low-cost adaptive controller for prefetching in a banked shared last-level cache

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Architecture and Code Optimization
    ACM Transactions on Architecture and Code Optimization  Volume 8, Issue 4
    Special Issue on High-Performance Embedded Architectures and Compilers
    January 2012
    765 pages
    ISSN:1544-3566
    EISSN:1544-3973
    DOI:10.1145/2086696
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 January 2012
    Accepted: 01 November 2011
    Revised: 01 October 2011
    Received: 01 July 2011
    Published in TACO Volume 8, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Prefetch
    2. shared resources management

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • Spanish Government and European ERDF

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)122
    • Downloads (Last 6 weeks)14
    Reflects downloads up to 15 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Hyperion: A Highly Effective Page and PC Based Delta PrefetcherACM Transactions on Architecture and Code Optimization10.1145/367539821:4(1-27)Online publication date: 1-Jul-2024
    • (2022)Berti: An Accurate Local-Delta Data PrefetcherProceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO56248.2022.00072(975-991)Online publication date: 1-Oct-2022
    • (2019)Combining Prefetch Control and Cache Partitioning to Improve Multicore Performance2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2019.00103(953-962)Online publication date: May-2019
    • (2017)Band-Pass PrefetchingACM Transactions on Architecture and Code Optimization10.1145/309063514:2(1-27)Online publication date: 28-Jun-2017
    • (2016)A Survey of Recent Prefetching Techniques for Processor CachesACM Computing Surveys10.1145/290707149:2(1-35)Online publication date: 2-Aug-2016
    • (2016)SPACIEEE Transactions on Computers10.1109/TC.2016.254739265:12(3740-3753)Online publication date: 1-Dec-2016
    • (2015)CAFFEINEACM Transactions on Architecture and Code Optimization10.1145/280689112:3(1-25)Online publication date: 31-Aug-2015
    • (2014)Balanced Prefetching Aggressiveness Controller for NoC-based MultiprocessorProceedings of the 27th Symposium on Integrated Circuits and Systems Design10.1145/2660540.2660541(1-7)Online publication date: 1-Sep-2014
    • (2014)Revisiting LP-NUCA Energy ConsumptionACM Transactions on Architecture and Code Optimization10.1145/263221711:2(1-26)Online publication date: 1-Jun-2014
    • (2014)Task Scheduling on Adaptive Multi-CoreIEEE Transactions on Computers10.1109/TC.2013.11563:10(2590-2603)Online publication date: 1-Oct-2014
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media