research-article

Pseudo-LIFO: the foundation of a new family of replacement policies for last-level caches

Author:

Mainak ChaudhuriAuthors Info & Claims

MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture

Pages 401 - 412

https://doi.org/10.1145/1669112.1669164

Published: 12 December 2009 Publication History

Get Access

Abstract

Cache blocks often exhibit a small number of uses during their life time in the last-level cache. Past research has exploited this property in two different ways. First, replacement policies have been designed to evict dead blocks early and retain the potentially live blocks. Second, dynamic insertion policies attempt to victimize single-use blocks (dead on fill) as early as possible, thereby leaving most of the working set undisturbed in the cache. However, we observe that as the last-level cache grows in capacity and associativity, the traditional dead block prediction-based replacement policy loses effectiveness because often the LRU block itself is dead leading to an LRU replacement decision. The benefit of dynamic insertion policies is also small in a large class of applications that exhibit a significant number of cache blocks with small, yet more than one, uses.

To address these drawbacks, we introduce pseudo-last-in-first-out (pseudo-LIFO), a fundamentally new family of replacement heuristics that manages each cache set as a fill stack (as opposed to the traditional access recency stack). We specify three members of this family, namely, dead block prediction LIFO, probabilistic escape LIFO, and probabilistic counter LIFO. The probabilistic escape LIFO (peLIFO) policy is the central contribution of this paper. It dynamically learns the use probabilities of cache blocks beyond each fill stack position to implement a new replacement policy. Our detailed simulation results show that peLIFO, while having less than 1% storage overhead, reduces the execution time by 10% on average compared to a baseline LRU replacement policy for a set of fourteen single-threaded applications on a 2 MB 16-way set associative L2 cache. It reduces the average CPI by 19% on average for a set of twelve multiprogrammed workloads while satisfying a strong fairness requirement on a four-core chip-multiprocessor with an 8 MB 16-way set associative shared L2 cache. Further, it reduces the parallel execution time by 17% on average for a set of six multi-threaded programs on an eight-core chip-multiprocessor with a 4 MB 16-way set associative shared L2 cache. For the architectures considered in this paper, the storage overhead of the peLIFO policy is one-fifth to half of that of a state-of-the-art dead block prediction-based replacement policy. However, the peLIFO policy delivers better average performance for the selected single-threaded and multiprogrammed workloads and similar average performance for the multi-threaded workloads compared to the dead block prediction-based replacement policy.

References

[1]

A. Basu et al. Scavenger: A New Last Level Cache Architecture with Global Block Priority. In Proc. of the 40th Intl. Symp. on Microarchitecture, pages 421--432, December 2007.

Abstract

References

Cited By

Index Terms

Recommendations

Temporal-based multilevel correlating inclusive cache replacement

WADE: Writeback-aware dynamic cache management for NVM-based main memory system

Dense Footprint Cache: Capacity-Efficient Die-Stacked DRAM Last Level Cache

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations