[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1109/MICRO.2012.31acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
Article

A Mostly-Clean DRAM Cache for Effective Hit Speculation and Self-Balancing Dispatch

Published: 01 December 2012 Publication History

Abstract

Die-stacking technology allows conventional DRAM to be integrated with processors. While numerous opportunities to make use of such stacked DRAM exist, one promising way is to use it as a large cache. Although previous studies show that DRAM caches can deliver performance benefits, there remain inefficiencies as well as significant hardware costs for auxiliary structures. This paper presents two innovations that exploit the bursty nature of memory requests to streamline the DRAM cache. The first is a low-cost Hit-Miss Predictor (HMP) that virtually eliminates the hardware overhead of the previously proposed multi-megabyte Miss Map structure. The second is a Self-Balancing Dispatch (SBD) mechanism that dynamically sends some requests to the off-chip memory even though the request may have hit in the die-stacked DRAM cache. This makes effective use of otherwise idle off-chip bandwidth when the DRAM cache is servicing a burst of cache hits. These techniques, however, are hampered by dirty (modified) data in the DRAM cache. To ensure correctness in the presence of dirty data in the cache, the HMP must verify that a block predicted as a miss is not actually present, otherwise the dirty block must be provided. This verification process can add latency, especially when DRAM cache banks are busy. In a similar vein, SBD cannot redirect requests to off-chip memory when a dirty copy of the block exists in the DRAM cache. To relax these constraints, we introduce a hybrid write policy for the cache that simultaneously supports write-through and write-back policies for different pages. Only a limited number of pages are permitted to operate in a write-back mode at one time, thereby bounding the amount of dirty data in the DRAM cache. By keeping the majority of the DRAM cache clean, most HMP predictions do not need to be verified, and the self balancing dispatch has more opportunities to redistribute requests (i.e., only requests to the limited number of dirty pages must go to the DRAM cache to maintain correctness). Our proposed techniques improve performance compared to the Miss Map-based DRAM cache approach while simultaneously eliminating the costly Miss Map structure.

References

[1]
B. Black, M. M. Annavaram, E. Brekelbaum, J. DeVale, L. Jiang, G. H. Loh, D. McCauley, P. Morrow, D. W. Nelson, D. Pantuso, P. Reed, J. Rupley, S. Shankar, J. P. Shen, and C. Webb, "Die Stacking (3D) Microarchitecture," in MICRO-39, 2006.
[2]
P. Conway, N. Kalyanasundharam, G. Donley, K. Lepak, and B. Hughes, "Cache Hierarchy and Memory Subsystem of the AMD Opteron Processor," IEEE Micro, March-April 2010.
[3]
S. Damaraju, V. George, S. Jahagirdar, T. Khondker, R. Milstrey, S. Sarkar, S. Siers, I. Stolero, and A. Subbiah1, "A 22nm IA Multi-CPU and GPU System-on-Chip," in ISSCC, 2012.
[4]
X. Dong, Y. Xie, N. Muralimanohar, and N. P. Jouppi, "Simple but Effective Heterogeneous Main Memory with On-Chip Memory Controller Support," in SC, 2010.
[5]
S. Eyerman and L. Eeckhout, "System-Level Performance Metrics for Multiprogram Workloads," IEEE Micro, May-June 2008.
[6]
"Macsim simulator," http://code.google.com/p/macsim/, HPArch.
[7]
A. Jaleel, K. Theobald, S. C. Steely, and J. Emer, "High Performance Cache Replacement Using Re-Reference Interval Prediction (RRIP)," in ISCA-32, 2010.
[8]
X. Jiang, N. Madan, L. Zhao, M. Upton, R. Iyer, S. Makineni, D. Newell, Y. Solihin, and R. Balasubramonian, "CHOP: Adaptive Filter-Based DRAM Caching for CMP Server Platforms," in HPCA-16, 2010.
[9]
J.-S. Kim, C. Oh, H. Lee, D. Lee, H.-R. Hwang, S. Hwang, B. Na, J. Moon, J.-G. Kim, H. Park, J.-W. Ryu, K. Park, S.-K. Kang, S.-Y. Kim, H. Kim, J.-M. Bang, H. Cho, M. Jang, C. Han, J.-B. Lee, K. Kyung, J.-S. Choi, and Y.-H. Jun, "A 1.2V 12.8GB/s 2Gb Mobile Wide-I/O DRAM with 4x128 I/Os Using TSV-Based Stacking," in ISSCC, 2011.
[10]
H. Liu, M. Ferdman, J. Huh, and D. Burger, "Cache Bursts: A New Approach for Eliminating Dead Blocks and Increasing Cache Efficiency," in MICRO-41, 2008.
[11]
G. H. Loh and M. D. Hill, "Efficiently Enabling Conventional Block Sizes for Very Large Die-Stacked DRAM Caches," in MICRO-44, 2011.
[12]
G. H. Loh, N. Jayasena, K. McGrath, M. O'Connor, S. Reinhardt, and J. Chung, "Challenges in Heterogeneous Die-Stacked and Off-Chip Memory Systems," in SHAW-3, 2012.
[13]
A. Seznec and P. Michaud, "A Case for (Partially) TAgged GEometric History Length Branch Prediction," Journal of Instruction-Level Parallelism, 2006.
[14]
T. Sherwood, E. Perelman, G. Hamerly, and B. Calder, "Automatically characterizing large scale program behavior," in ASPLOS-X, 2002.
[15]
J. E. Smith, "A Study of Branch Prediction Strategies," in ISCA-3, 1981.
[16]
A. Snavely and D. Tullsen, "Symbiotic Job Scheduling for a Simultaneous Multithreading Processor," in ASPLOS-IX, 2000.
[17]
G. Taylor, P. Davies, and M. Farmwald, "The TLB Slice - A Low-Cost High-Speed Address Translation Mechanism," in ISCA-12, 1990.
[18]
A. Yoaz, M. Erez, R. Ronen, and S. Jourdan, "Speculation Techniques for Improving Load Related Instruction Scheduling," in ISCA-21, 1999.
[19]
J. Zhao, G. Sun, Y. Xie, and G. H. Loh, "Energy-Efficient GPU Design with Reconfigurable In-Package Graphics Memory," in ISLPED, 2012.
[20]
L. Zhao, R. Iyer, R. Illikkal, and D. Newell, "Exploring DRAM Cache Architectures for CMP Server Platforms," in ICCD-25, 2007.

Cited By

View all
  • (2023)ABNDP: Co-optimizing Data Access and Load Balance in Near-Data ProcessingProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582026(3-17)Online publication date: 25-Mar-2023
  • (2023)Implicit Memory Tagging: No-Overhead Memory Safety Using Alias-Free Tagged ECCProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589102(1-13)Online publication date: 17-Jun-2023
  • (2022)An Energy-Efficient DRAM Cache Architecture for Mobile Platforms With PCM-Based Main MemoryACM Transactions on Embedded Computing Systems10.1145/345199521:1(1-22)Online publication date: 14-Jan-2022
  • Show More Cited By

Index Terms

  1. A Mostly-Clean DRAM Cache for Effective Hit Speculation and Self-Balancing Dispatch

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MICRO-45: Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
      December 2012
      487 pages
      ISBN:9780769549248

      Sponsors

      Publisher

      IEEE Computer Society

      United States

      Publication History

      Published: 01 December 2012

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate 484 of 2,242 submissions, 22%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)3
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 10 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)ABNDP: Co-optimizing Data Access and Load Balance in Near-Data ProcessingProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582026(3-17)Online publication date: 25-Mar-2023
      • (2023)Implicit Memory Tagging: No-Overhead Memory Safety Using Alias-Free Tagged ECCProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589102(1-13)Online publication date: 17-Jun-2023
      • (2022)An Energy-Efficient DRAM Cache Architecture for Mobile Platforms With PCM-Based Main MemoryACM Transactions on Embedded Computing Systems10.1145/345199521:1(1-22)Online publication date: 14-Jan-2022
      • (2021)Reliability-aware Garbage Collection for Hybrid HBM-DRAM MemoriesACM Transactions on Architecture and Code Optimization10.1145/343180318:1(1-25)Online publication date: 20-Jan-2021
      • (2019)DUCATIACM Transactions on Architecture and Code Optimization10.1145/330971016:1(1-24)Online publication date: 8-Mar-2019
      • (2019)Nimble Page Management for Tiered Memory SystemsProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3297858.3304024(331-345)Online publication date: 4-Apr-2019
      • (2019)Decoupled Fused CacheACM Transactions on Architecture and Code Optimization10.1145/329344715:4(1-23)Online publication date: 8-Jan-2019
      • (2018)CAMOProceedings of the 23rd Asia and South Pacific Design Automation Conference10.5555/3201607.3201652(215-220)Online publication date: 22-Jan-2018
      • (2018)Opportunistic compression for direct-mapped DRAM cachesProceedings of the International Symposium on Memory Systems10.1145/3240302.3240429(129-136)Online publication date: 1-Oct-2018
      • (2018)Runtime-Guided Management of Stacked DRAM Memories in Task Parallel ProgramsProceedings of the 2018 International Conference on Supercomputing10.1145/3205289.3205312(218-228)Online publication date: 12-Jun-2018
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media