[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Open access

WADE: Writeback-aware dynamic cache management for NVM-based main memory system

Published: 01 December 2013 Publication History

Abstract

Emerging Non-Volatile Memory (NVM) technologies are explored as potential alternatives to traditional SRAM/DRAM-based memory architecture in future microprocessor design. One of the major disadvantages for NVM is the latency and energy overhead associated with write operations. Mitigation techniques to minimize the write overhead for NVM-based main memory architecture have been studied extensively. However, most prior work focuses on optimization techniques for NVM-based main memory itself, with little attention paid to cache management policies for the Last-Level Cache (LLC).
In this article, we propose a Writeback-Aware Dynamic CachE (WADE) management technique to help mitigate the write overhead in NVM-based memory.<sup;>1</sup;> The proposal is based on the observation that, when dirty cache blocks are evicted from the LLC and written into NVM-based memory (with PCM as an example), the long latency and high energy associated with write operations to NVM-based memory can cause system performance/power degradation. Thus, reducing the number of writeback requests from the LLC is critical.
The proposed WADE cache management technique tries to keep highly reused dirty cache blocks in the LLC. The technique predicts blocks that are frequently written back in the LLC. The LLC sets are dynamically partitioned into a frequent writeback list and a nonfrequent writeback list. It keeps a best size of each list in the LLC. Our evaluation shows that the technique can reduce the number of writeback requests by 16.5% for memory-intensive single-threaded benchmarks and 10.8% for multicore workloads. It yields a geometric mean speedup of 5.1% for single-thread applications and 7.6% for multicore workloads. Due to the reduced number of writeback requests to main memory, the technique reduces the energy consumption by 8.1% for single-thread applications and 7.6% for multicore workloads.

References

[1]
Cantin, J. F., Lipasti, M. H., and Smith, J. E. 2006. Stealth prefetching. SIGOPS Oper. Syst. Rev. 40, 5, 274--282.
[2]
Cantin, J. F., Smith, J. E., Lipasti, M. H., Moshovos, A., and Falsafi, B. 2006. Coarse-grain coherence tracking: Regionscout and region coherence arrays. IEEE Micro 26, 1, 70--79.
[3]
Chang, J. and Sohi, G. S. 2007. Cooperative cache partitioning for chip multiprocessors. In Proceedings of the 21st Annual International Conference on Supercomputing (ICS’07). ACM, New York, NY, 242--252.
[4]
Choi, Y., et al. 2012. A 20nm 1.8v 8Gb PRAM with 40MB/s program bandwidth. In Proceedings of the IEEE International Solid-State Circuits Conference.
[5]
Hanzawa, S., Kitai, N., Osada, K., Kotabe, A., Matsui, Y., et al. 2007. A 512kB embedded phase change memory with 416kB/s write throughput at 100&mu;A cell write current. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC’07). 474--616.
[6]
Henning, J. L. 2006. SPEC CPU2006 benchmark descriptions. SIGARCH Comput. Archit. News 34, 1--17.
[7]
HP-Laboratories. 2008. Cacti 5.3. Retrieved from http://quid.hpl.hp.com:9081/cacti.
[8]
Jaleel, A., Hasenplaugh, W., Qureshi, M. K., Sebot, J., Jr., S. S., and Emer, J. 2008. Adaptive insertion policies for managing shared caches. In Proceedings of the 2008 International Conference on Parallel Architectures and Compiler Techniques (PACT’08).
[9]
Jaleel, A., Theobald, K., Jr., S. S., and Emer, J. 2010. High performance cache replacement using re-reference interval prediction. In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA’10).
[10]
Jiang, X., Madan, N., Zhao, L., Upton, M., Iyer, R., Makineni, S., Newell, D., Solihin, Y., and Balasubramonian, R. 2011. Chop: Integrating dram caches for CMP server platforms. IEEE Micro 31, 1, 99--108.
[11]
Joo, Y., Niu, D., Dong, X., Sun, G., Chang, N., and Xie, Y. 2010. Energy- and endurance-aware design of phase change memory caches. In Proceedings of Design, Automation and Test in Europe (DATE’10). 136--141.
[12]
Khan, S. M., Wang, Z., and Jimenez, D. A. 2012. Decoupled dynamic cache segmentation. In Proceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture (HPCA’12). IEEE Computer Society, Washington, DC, 1--12.
[13]
Lee, B. C., Ipek, E., Mutlu, O., and Burger, D. 2009. Architecting phase change memory as a scalable dram alternative. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA’09). ACM, New York, NY, 2--13.
[14]
Lee, C. J., Narasiman, V., Ebrahimi, E., Mutlu, O., and Patt, Y. N. 2010. DRAM-aware last level cache writeback: Reducing write-caused interference in memory system. HPS Technical Report.
[15]
Lee, H.-H. S., Tyson, G. S., and Farrens, M. K. 2000. Eager writeback—a technique for improving bandwidth utilization. In Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture (MICRO’00). ACM, New York, NY, 11--21.
[16]
Patel, A., Afram, F., Chen, S., and Ghose, K. 2011. MARSSx86: A full system simulator for x86 CPUs. In Proceedings of the 2011 Design Automation Conference.
[17]
Pellizzer, F., Pirovano, A., Ottogalli, F., Magistretti, M., Scaravaggi, M., et al. 2004. Novel &mu;Trench phase-change memory cell for embedded and stand-alone non-volatile memory applications. In Proceedings of the 2004 Symposium on VLSI Technology. 18--19.
[18]
Qureshi, M. K., Franceschini, M. M., Jagmohan, A., and Lastras, L. A. 2012. Preset: Improving performance of phase change memories by exploiting asymmetry in write times. In Proceedings of the 39th International Symposium on Computer Architecture (ISCA’12). IEEE Press, Piscataway, NJ, 380--391.
[19]
Qureshi, M. K., Franceschini, M. M., and Lastras-Montao, L. A. 2010. Improving read performance of phase change memories via write cancellation and write pausing. In International Symposium on High Performance Computer Architecture (HPCA’10). 1--11.
[20]
Qureshi, M. K., Jaleel, A., Patt, Y. N., Steely, S. C., and Emer, J. 2007. Adaptive insertion policies for high performance caching. In Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA’07). ACM, New York, NY, 381--391.
[21]
Qureshi, M. K., Karidis, J., Franceschini, M., Srinivasan, V., Lastras, L., and Abali, B. 2009. Enhancing lifetime and security of pcm-based main memory with start-gap wear leveling. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’09). ACM, New York, NY, 14--23.
[22]
Qureshi, M. K., Lynch, D. N., Mutlu, O., and Patt, Y. N. 2006. A case for mlp-aware cache replacement. In Proceedings of the 33rd Annual International Symposium on Computer Architecture (ISCA’06). IEEE Computer Society, Washington, DC, 167--178.
[23]
Qureshi, M. K. and Patt, Y. N. 2006. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’06). IEEE Computer Society, Washington, DC, 423--432.
[24]
Qureshi, M. K., Srinivasan, V., and Rivers, J. A. 2009. Scalable high performance main memory system using phase-change memory technology. In Proceedings of the International Symposium on Computer Architecture (ISCA’09).
[25]
Ramos, L. E., Gorbatov, E., and Bianchini, R. 2011. Page placement in hybrid memory systems. In Proceedings of the International Conference on Supercomputing (ICS’11). ACM, New York, NY, 85--95.
[26]
Raoux, S., Burr, G. W., Breitwisch, M. J., Rettner, C. T., Chen, Y.-C., et al. 2008. Phase-change random access memory: A scalable technology. IBM Journal of Research and Development 52, 4/5.
[27]
Rosenfeld, P., Cooper-Balis, E., and Jacob, B. 2011. Dramsim2: A cycle accurate memory system simulator. Computer Architecture Letters PP, 99, 1.
[28]
Sherwood, T., Perelman, E., Hamerly, G., and Calder, B. 2002. Automatically characterizing large scale program behavior. In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems.
[29]
Stuecheli, J., Kaseridis, D., Daly, D., Hunter, H. C., and John, L. K. 2010. The virtual write queue: coordinating dram and last-level cache policies. In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA’10). ACM, New York, NY, 72--82.
[30]
Sudan, K., Chatterjee, N., Nellans, D., Awasthi, M., Balasubramonian, R., and Davis, A. 2010. Micro-pages: Increasing DRAM efficiency with locality-aware data placement. In Proceedings of the 15th Edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems (ASPLOS’10). ACM, New York, NY, 219--230.
[31]
Sun, G., Dong, X., Xie, Y., Li, J., and Chen, Y. 2009. A novel architecture of the 3d stacked mRAM L2 cache for CMPs. In HPCA. 239--249.
[32]
Wang, Z., Khan, S. M., and Jiménez, D. A. 2012. Improving writeback efficiency with decoupled last-write prediction. In Proceedings of the 39th International Symposium on Computer Architecture (ISCA’12). IEEE Press, Piscataway, NJ, 309--320.
[33]
Wu, C.-J., Jaleel, A., Hasenplaugh, W., Martonosi, M., Steely, Jr., S. C., and Emer, J. 2011. Ship: signature-based hit predictor for high performance caching. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’11). ACM, New York, NY, 430--441.
[34]
Xie, Y. 2011. Modeling, architecture, and applications for emerging memory technologies. IEEE Computer Design and Test, 28, 41--51.
[35]
Xie, Y. and Loh, G. H. 2009. PIPP: Promotion/insertion pseudo-partitioning of multi-core shared caches. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA’09). ACM, New York, NY, 174--183.
[36]
Yoon, H., Meza, J., Ausavarungnirun, R., Harding, R., and Mutlu, O. 2012. Row buffer locality aware caching policies for hybrid memories. In Proceedings of the International Conference on Computer Design (ICCD’12).
[37]
Zhou, M., Du, Y., Childers, B., Melhem, R., and Mossé, D. 2012. Writeback-aware partitioning and replacement for last-level caches in phase change main memory systems. ACM Trans. Archit. Code Optim. 8, 4, 53:1--53:21.
[38]
Zhou, P., Zhao, B., Yang, J., and Zhang, Y. 2009. A durable and energy efficient main memory using phase change memory technology. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA’09). ACM, New York, NY, 14--23.

Cited By

View all
  • (2024)MORSE: Memory Overwrite Time Guided Soft Writes to Improve ReRAM Energy and EnduranceProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3676890(26-39)Online publication date: 14-Oct-2024
  • (2023)An Efficient NVM-Based Architecture for Intermittent Computing Under Energy ConstraintsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2023.326655531:6(725-737)Online publication date: Jun-2023
  • (2023)SW-PCM: Graceful Degradation Support in PCM Main Memories by Using Swaption MechanismProceedings of the Future Technologies Conference (FTC) 2023, Volume 310.1007/978-3-031-47457-6_34(514-531)Online publication date: 9-Nov-2023
  • Show More Cited By

Index Terms

  1. WADE: Writeback-aware dynamic cache management for NVM-based main memory system

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Architecture and Code Optimization
    ACM Transactions on Architecture and Code Optimization  Volume 10, Issue 4
    December 2013
    1046 pages
    ISSN:1544-3566
    EISSN:1544-3973
    DOI:10.1145/2541228
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 December 2013
    Accepted: 01 November 2013
    Revised: 01 August 2013
    Received: 01 June 2013
    Published in TACO Volume 10, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Last-level cache
    2. cache segmentation
    3. nonvolatile memory
    4. replacement policy

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)98
    • Downloads (Last 6 weeks)14
    Reflects downloads up to 02 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)MORSE: Memory Overwrite Time Guided Soft Writes to Improve ReRAM Energy and EnduranceProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3676890(26-39)Online publication date: 14-Oct-2024
    • (2023)An Efficient NVM-Based Architecture for Intermittent Computing Under Energy ConstraintsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2023.326655531:6(725-737)Online publication date: Jun-2023
    • (2023)SW-PCM: Graceful Degradation Support in PCM Main Memories by Using Swaption MechanismProceedings of the Future Technologies Conference (FTC) 2023, Volume 310.1007/978-3-031-47457-6_34(514-531)Online publication date: 9-Nov-2023
    • (2022)Speculative Load Forwarding Attack on Modern ProcessorsProceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design10.1145/3508352.3549417(1-9)Online publication date: 30-Oct-2022
    • (2022)Don't open rowProceedings of the 59th ACM/IEEE Design Automation Conference10.1145/3489517.3530540(823-828)Online publication date: 10-Jul-2022
    • (2022)Planting Fast-Growing Forest by Leveraging the Asymmetric Read/Write Latency of NVRAM-Based SystemsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2021.312668041:10(3304-3317)Online publication date: Oct-2022
    • (2020)Algorithmic Fault Detection for RRAM-based Matrix OperationsACM Transactions on Design Automation of Electronic Systems10.1145/338636025:3(1-31)Online publication date: 13-May-2020
    • (2020)An Energy-aware Online Learning Framework for Resource Management in Heterogeneous PlatformsACM Transactions on Design Automation of Electronic Systems10.1145/338635925:3(1-26)Online publication date: 13-May-2020
    • (2020)SCRIPTACM Transactions on Design Automation of Electronic Systems10.1145/338344525:3(1-27)Online publication date: 13-May-2020
    • (2020)Security of Microfluidic BiochipACM Transactions on Design Automation of Electronic Systems10.1145/338212725:3(1-29)Online publication date: 21-Apr-2020
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Full Access

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media