[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1669112.1669164acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

Pseudo-LIFO: the foundation of a new family of replacement policies for last-level caches

Published: 12 December 2009 Publication History

Abstract

Cache blocks often exhibit a small number of uses during their life time in the last-level cache. Past research has exploited this property in two different ways. First, replacement policies have been designed to evict dead blocks early and retain the potentially live blocks. Second, dynamic insertion policies attempt to victimize single-use blocks (dead on fill) as early as possible, thereby leaving most of the working set undisturbed in the cache. However, we observe that as the last-level cache grows in capacity and associativity, the traditional dead block prediction-based replacement policy loses effectiveness because often the LRU block itself is dead leading to an LRU replacement decision. The benefit of dynamic insertion policies is also small in a large class of applications that exhibit a significant number of cache blocks with small, yet more than one, uses.
To address these drawbacks, we introduce pseudo-last-in-first-out (pseudo-LIFO), a fundamentally new family of replacement heuristics that manages each cache set as a fill stack (as opposed to the traditional access recency stack). We specify three members of this family, namely, dead block prediction LIFO, probabilistic escape LIFO, and probabilistic counter LIFO. The probabilistic escape LIFO (peLIFO) policy is the central contribution of this paper. It dynamically learns the use probabilities of cache blocks beyond each fill stack position to implement a new replacement policy. Our detailed simulation results show that peLIFO, while having less than 1% storage overhead, reduces the execution time by 10% on average compared to a baseline LRU replacement policy for a set of fourteen single-threaded applications on a 2 MB 16-way set associative L2 cache. It reduces the average CPI by 19% on average for a set of twelve multiprogrammed workloads while satisfying a strong fairness requirement on a four-core chip-multiprocessor with an 8 MB 16-way set associative shared L2 cache. Further, it reduces the parallel execution time by 17% on average for a set of six multi-threaded programs on an eight-core chip-multiprocessor with a 4 MB 16-way set associative shared L2 cache. For the architectures considered in this paper, the storage overhead of the peLIFO policy is one-fifth to half of that of a state-of-the-art dead block prediction-based replacement policy. However, the peLIFO policy delivers better average performance for the selected single-threaded and multiprogrammed workloads and similar average performance for the multi-threaded workloads compared to the dead block prediction-based replacement policy.

References

[1]
A. Basu et al. Scavenger: A New Last Level Cache Architecture with Global Block Priority. In Proc. of the 40th Intl. Symp. on Microarchitecture, pages 421--432, December 2007.
[2]
HP Labs. CACTI 4.2. Available at http://www.hpl.hp.com/personal/Norman_Jouppi/cacti4.html.
[3]
Z. Hu, S. Kaxiras, and M. Martonosi. Timekeeping in the Memory System: Predicting and Optimizing Memory Behavior. In Proc. of the 29th Intl. Symp. on Computer Architecture, pages 209--220, May 2002.
[4]
A. Jaleel et al. Adaptive Insertion Policies for Managing Shared Caches. In Proc. of the 17th Intl. Conf. on Parallel Architecture and Compilation Techniques, pages 208--219, October 2008.
[5]
N. P. Jouppi. Improving Direct-mapped Cache Performance by the Addition of a Small Fully Associative Cache and Prefetch Buffers. In Proc. of the 17th Intl. Symp. on Computer Architecture, pages 364--373, June 1990.
[6]
R. E. Kessler and M. D. Hill. Page Placement Algorithms for Large Real-indexed Caches. In ACM Transactions on Computer Systems, 10(4): 338--359, November 1992.
[7]
A-C. Lai, C. Fide, and B. Falsafi. Dead-block Prediction&Dead-block Correlating Prefetchers. In Proc. of the 28th Intl. Symp. on Computer Architecture, pages 144--154, June/July 2001.
[8]
J. Laudon and D. Lenoski. The SGI Origin: A ccNUMA Highly Scalable Server. In Proc. of the 24th Intl. Symp. on Computer Architecture, pages 241--251, June 1997.
[9]
H. Liu et al. Cache Bursts: A New Approach for Eliminating Dead Blocks and Increasing Cache Efficiency. In Proc. of the 41st Intl. Symp. on Microarchitecture, pages 222--233, November 2008.
[10]
M. K. Qureshi and Y. N. Patt. Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches. In Proc. of the 39th Intl. Symp. on Microarchitecture, pages 423--432, December 2006.
[11]
M. K. Qureshi et al. Adaptive Insertion Policies for High Performance Caching. In Proc. of the 34th Intl. Symp. on Computer Architecture, pages 381--391, June 2007.
[12]
T. Sherwood et al. Automatically Characterizing Large Scale Program Behavior. In Proc. of the 10th Intl. Conf. on Architectural Support on Programming Languages and Operating Systems, pages 45--57, October 2002.
[13]
S. Srikantaiah, M. Kandemir, and M. J. Irwin. Adaptive Set Pinning: Managing Shared Caches in Chip Multiprocessors. In Proc. of the 13th Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, pages 135--144, March 2008.
[14]
D. K. Tam et al. RapidMRC: Approximating L2 Miss Rate Curves on Commodity Systems for Online Optimizations. In Proc. of the 14th Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, pages 121--132, March 2009.
[15]
Y. Xie and G. H. Loh. PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-core Shared Caches. In Proc. of the 36th Intl. Symp. on Computer Architecture, pages 174--183, June 2009.
[16]
Z. Zhang, Z. Zhu, and X. Zhang. A Permutation-based Page Interleaving Scheme to Reduce Row-buffer Conflicts and Exploit Data Locality. In Proc. of the 33rd Intl. Symp. on Microarchitecture, pages 32--41, December 2000.

Cited By

View all
  • (2023)HyGain: High-performance, Energy-efficient Hybrid Gain Cell-based Cache HierarchyACM Transactions on Architecture and Code Optimization10.1145/357283920:2(1-20)Online publication date: 1-Mar-2023
  • (2023)Boustrophedonic Frames: Quasi-Optimal L2 Caching for Textures in GPUs2023 32nd International Conference on Parallel Architectures and Compilation Techniques (PACT)10.1109/PACT58117.2023.00019(124-136)Online publication date: 21-Oct-2023
  • (2023)CARE: A Concurrency-Aware Enhanced Lightweight Cache Management Framework2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071125(1208-1220)Online publication date: Feb-2023
  • Show More Cited By

Index Terms

  1. Pseudo-LIFO: the foundation of a new family of replacement policies for last-level caches

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
    December 2009
    601 pages
    ISBN:9781605587981
    DOI:10.1145/1669112
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 December 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. chip-multiprocessor
    2. last-level cache
    3. replacement policy

    Qualifiers

    • Research-article

    Conference

    Micro-42
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 484 of 2,242 submissions, 22%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)16
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 24 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)HyGain: High-performance, Energy-efficient Hybrid Gain Cell-based Cache HierarchyACM Transactions on Architecture and Code Optimization10.1145/357283920:2(1-20)Online publication date: 1-Mar-2023
    • (2023)Boustrophedonic Frames: Quasi-Optimal L2 Caching for Textures in GPUs2023 32nd International Conference on Parallel Architectures and Compilation Techniques (PACT)10.1109/PACT58117.2023.00019(124-136)Online publication date: 21-Oct-2023
    • (2023)CARE: A Concurrency-Aware Enhanced Lightweight Cache Management Framework2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071125(1208-1220)Online publication date: Feb-2023
    • (2023)ACIC: Admission-Controlled Instruction Cache2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071033(165-178)Online publication date: Feb-2023
    • (2023)Optimization strategies for GPUs: an overview of architectural approachesInternational Journal of Parallel, Emergent and Distributed Systems10.1080/17445760.2023.217375238:2(140-154)Online publication date: 5-Feb-2023
    • (2023)An Adaptive Replacement Strategy LWIRR for Shared Last Level Cache L3 in Multi-core ProcessorsProceedings of Trends in Electronics and Health Informatics10.1007/978-981-99-1916-1_31(415-426)Online publication date: 28-Jun-2023
    • (2022)TCOR: A Tile Cache with Optimal Replacement2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00055(662-675)Online publication date: Apr-2022
    • (2021)SRCP: sharing and reuse-aware replacement policy for the partitioned cache in multicore systemsDesign Automation for Embedded Systems10.1007/s10617-021-09251-z25:3(193-211)Online publication date: 1-Sep-2021
    • (2020)Miss Penalty Aware Cache Replacement for Hybrid Memory SystemsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2020.2966482(1-1)Online publication date: 2020
    • (2019)On Cost-Driven Collaborative Data Caching: A New Model ApproachIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.286864230:3(662-676)Online publication date: 1-Mar-2019
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media