More Web Proxy on the site http://driver.im/

research-article

Open access

WADE: Writeback-aware dynamic cache management for NVM-based main memory system

Authors:

Daniel A. JiménezAuthors Info & Claims

ACM Transactions on Architecture and Code Optimization (TACO), Volume 10, Issue 4

Article No.: 51, Pages 1 - 21

https://doi.org/10.1145/2541228.2555307

Published: 01 December 2013 Publication History

Abstract

Emerging Non-Volatile Memory (NVM) technologies are explored as potential alternatives to traditional SRAM/DRAM-based memory architecture in future microprocessor design. One of the major disadvantages for NVM is the latency and energy overhead associated with write operations. Mitigation techniques to minimize the write overhead for NVM-based main memory architecture have been studied extensively. However, most prior work focuses on optimization techniques for NVM-based main memory itself, with little attention paid to cache management policies for the Last-Level Cache (LLC).

In this article, we propose a Writeback-Aware Dynamic CachE (WADE) management technique to help mitigate the write overhead in NVM-based memory.<sup;>1</sup;> The proposal is based on the observation that, when dirty cache blocks are evicted from the LLC and written into NVM-based memory (with PCM as an example), the long latency and high energy associated with write operations to NVM-based memory can cause system performance/power degradation. Thus, reducing the number of writeback requests from the LLC is critical.

The proposed WADE cache management technique tries to keep highly reused dirty cache blocks in the LLC. The technique predicts blocks that are frequently written back in the LLC. The LLC sets are dynamically partitioned into a frequent writeback list and a nonfrequent writeback list. It keeps a best size of each list in the LLC. Our evaluation shows that the technique can reduce the number of writeback requests by 16.5% for memory-intensive single-threaded benchmarks and 10.8% for multicore workloads. It yields a geometric mean speedup of 5.1% for single-thread applications and 7.6% for multicore workloads. Due to the reduced number of writeback requests to main memory, the technique reduces the energy consumption by 8.1% for single-thread applications and 7.6% for multicore workloads.

References

[1]

Cantin, J. F., Lipasti, M. H., and Smith, J. E. 2006. Stealth prefetching. SIGOPS Oper. Syst. Rev. 40, 5, 274--282.

Digital Library

[2]

Cantin, J. F., Smith, J. E., Lipasti, M. H., Moshovos, A., and Falsafi, B. 2006. Coarse-grain coherence tracking: Regionscout and region coherence arrays. IEEE Micro 26, 1, 70--79.

Digital Library

[3]

Chang, J. and Sohi, G. S. 2007. Cooperative cache partitioning for chip multiprocessors. In Proceedings of the 21st Annual International Conference on Supercomputing (ICS’07). ACM, New York, NY, 242--252.

Digital Library

[4]

Choi, Y., et al. 2012. A 20nm 1.8v 8Gb PRAM with 40MB/s program bandwidth. In Proceedings of the IEEE International Solid-State Circuits Conference.

[5]

Hanzawa, S., Kitai, N., Osada, K., Kotabe, A., Matsui, Y., et al. 2007. A 512kB embedded phase change memory with 416kB/s write throughput at 100μA cell write current. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC’07). 474--616.

[6]

Henning, J. L. 2006. SPEC CPU2006 benchmark descriptions. SIGARCH Comput. Archit. News 34, 1--17.

Digital Library

[7]

HP-Laboratories. 2008. Cacti 5.3. Retrieved from http://quid.hpl.hp.com:9081/cacti.

[8]

Jaleel, A., Hasenplaugh, W., Qureshi, M. K., Sebot, J., Jr., S. S., and Emer, J. 2008. Adaptive insertion policies for managing shared caches. In Proceedings of the 2008 International Conference on Parallel Architectures and Compiler Techniques (PACT’08).

Digital Library

[9]

Jaleel, A., Theobald, K., Jr., S. S., and Emer, J. 2010. High performance cache replacement using re-reference interval prediction. In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA’10).

Digital Library

[10]

Jiang, X., Madan, N., Zhao, L., Upton, M., Iyer, R., Makineni, S., Newell, D., Solihin, Y., and Balasubramonian, R. 2011. Chop: Integrating dram caches for CMP server platforms. IEEE Micro 31, 1, 99--108.

Digital Library

[11]

Joo, Y., Niu, D., Dong, X., Sun, G., Chang, N., and Xie, Y. 2010. Energy- and endurance-aware design of phase change memory caches. In Proceedings of Design, Automation and Test in Europe (DATE’10). 136--141.

Digital Library

[12]

Khan, S. M., Wang, Z., and Jimenez, D. A. 2012. Decoupled dynamic cache segmentation. In Proceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture (HPCA’12). IEEE Computer Society, Washington, DC, 1--12.

Digital Library

[13]

Lee, B. C., Ipek, E., Mutlu, O., and Burger, D. 2009. Architecting phase change memory as a scalable dram alternative. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA’09). ACM, New York, NY, 2--13.

Digital Library

[14]

Lee, C. J., Narasiman, V., Ebrahimi, E., Mutlu, O., and Patt, Y. N. 2010. DRAM-aware last level cache writeback: Reducing write-caused interference in memory system. HPS Technical Report.

[15]

Lee, H.-H. S., Tyson, G. S., and Farrens, M. K. 2000. Eager writeback—a technique for improving bandwidth utilization. In Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture (MICRO’00). ACM, New York, NY, 11--21.

Digital Library

[16]

Patel, A., Afram, F., Chen, S., and Ghose, K. 2011. MARSSx86: A full system simulator for x86 CPUs. In Proceedings of the 2011 Design Automation Conference.

Digital Library

[17]

Pellizzer, F., Pirovano, A., Ottogalli, F., Magistretti, M., Scaravaggi, M., et al. 2004. Novel μTrench phase-change memory cell for embedded and stand-alone non-volatile memory applications. In Proceedings of the 2004 Symposium on VLSI Technology. 18--19.

[18]

Qureshi, M. K., Franceschini, M. M., Jagmohan, A., and Lastras, L. A. 2012. Preset: Improving performance of phase change memories by exploiting asymmetry in write times. In Proceedings of the 39th International Symposium on Computer Architecture (ISCA’12). IEEE Press, Piscataway, NJ, 380--391.

Digital Library

[19]

Qureshi, M. K., Franceschini, M. M., and Lastras-Montao, L. A. 2010. Improving read performance of phase change memories via write cancellation and write pausing. In International Symposium on High Performance Computer Architecture (HPCA’10). 1--11.

[20]

Qureshi, M. K., Jaleel, A., Patt, Y. N., Steely, S. C., and Emer, J. 2007. Adaptive insertion policies for high performance caching. In Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA’07). ACM, New York, NY, 381--391.

Digital Library

[21]

Qureshi, M. K., Karidis, J., Franceschini, M., Srinivasan, V., Lastras, L., and Abali, B. 2009. Enhancing lifetime and security of pcm-based main memory with start-gap wear leveling. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’09). ACM, New York, NY, 14--23.

Digital Library

[22]

Qureshi, M. K., Lynch, D. N., Mutlu, O., and Patt, Y. N. 2006. A case for mlp-aware cache replacement. In Proceedings of the 33rd Annual International Symposium on Computer Architecture (ISCA’06). IEEE Computer Society, Washington, DC, 167--178.

Digital Library

[23]

Qureshi, M. K. and Patt, Y. N. 2006. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’06). IEEE Computer Society, Washington, DC, 423--432.

Digital Library

[24]

Qureshi, M. K., Srinivasan, V., and Rivers, J. A. 2009. Scalable high performance main memory system using phase-change memory technology. In Proceedings of the International Symposium on Computer Architecture (ISCA’09).

Digital Library

[25]

Ramos, L. E., Gorbatov, E., and Bianchini, R. 2011. Page placement in hybrid memory systems. In Proceedings of the International Conference on Supercomputing (ICS’11). ACM, New York, NY, 85--95.

Digital Library

[26]

Raoux, S., Burr, G. W., Breitwisch, M. J., Rettner, C. T., Chen, Y.-C., et al. 2008. Phase-change random access memory: A scalable technology. IBM Journal of Research and Development 52, 4/5.

Digital Library

[27]

Rosenfeld, P., Cooper-Balis, E., and Jacob, B. 2011. Dramsim2: A cycle accurate memory system simulator. Computer Architecture Letters PP, 99, 1.

Digital Library

[28]

Sherwood, T., Perelman, E., Hamerly, G., and Calder, B. 2002. Automatically characterizing large scale program behavior. In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems.

Digital Library

[29]

Stuecheli, J., Kaseridis, D., Daly, D., Hunter, H. C., and John, L. K. 2010. The virtual write queue: coordinating dram and last-level cache policies. In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA’10). ACM, New York, NY, 72--82.

Digital Library

[30]

Sudan, K., Chatterjee, N., Nellans, D., Awasthi, M., Balasubramonian, R., and Davis, A. 2010. Micro-pages: Increasing DRAM efficiency with locality-aware data placement. In Proceedings of the 15th Edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems (ASPLOS’10). ACM, New York, NY, 219--230.

Digital Library

[31]

Sun, G., Dong, X., Xie, Y., Li, J., and Chen, Y. 2009. A novel architecture of the 3d stacked mRAM L2 cache for CMPs. In HPCA. 239--249.

[32]

Wang, Z., Khan, S. M., and Jiménez, D. A. 2012. Improving writeback efficiency with decoupled last-write prediction. In Proceedings of the 39th International Symposium on Computer Architecture (ISCA’12). IEEE Press, Piscataway, NJ, 309--320.

Digital Library

[33]

Wu, C.-J., Jaleel, A., Hasenplaugh, W., Martonosi, M., Steely, Jr., S. C., and Emer, J. 2011. Ship: signature-based hit predictor for high performance caching. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’11). ACM, New York, NY, 430--441.

Digital Library

[34]

Xie, Y. 2011. Modeling, architecture, and applications for emerging memory technologies. IEEE Computer Design and Test, 28, 41--51.

Digital Library

[35]

Xie, Y. and Loh, G. H. 2009. PIPP: Promotion/insertion pseudo-partitioning of multi-core shared caches. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA’09). ACM, New York, NY, 174--183.

Digital Library

[36]

Yoon, H., Meza, J., Ausavarungnirun, R., Harding, R., and Mutlu, O. 2012. Row buffer locality aware caching policies for hybrid memories. In Proceedings of the International Conference on Computer Design (ICCD’12).

Digital Library

[37]

Zhou, M., Du, Y., Childers, B., Melhem, R., and Mossé, D. 2012. Writeback-aware partitioning and replacement for last-level caches in phase change main memory systems. ACM Trans. Archit. Code Optim. 8, 4, 53:1--53:21.

Digital Library

[38]

Zhou, P., Zhao, B., Yang, J., and Zhang, Y. 2009. A durable and energy efficient main memory using phase change memory technology. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA’09). ACM, New York, NY, 14--23.

Digital Library

Cited By

Singh DYeung D(2024)MORSE: Memory Overwrite Time Guided Soft Writes to Improve ReRAM Energy and EnduranceProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3676890(26-39)Online publication date: 14-Oct-2024
https://dl.acm.org/doi/10.1145/3656019.3676890
Badri SSaini MGoel N(2023)An Efficient NVM-Based Architecture for Intermittent Computing Under Energy ConstraintsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2023.326655531:6(725-737)Online publication date: Jun-2023
https://doi.org/10.1109/TVLSI.2023.3266555
Urquijo GVoelckers KMeyer ABellinger CAsadinia M(2023)SW-PCM: Graceful Degradation Support in PCM Main Memories by Using Swaption MechanismProceedings of the Future Technologies Conference (FTC) 2023, Volume 310.1007/978-3-031-47457-6_34(514-531)Online publication date: 9-Nov-2023
https://doi.org/10.1007/978-3-031-47457-6_34
Show More Cited By

Index Terms

WADE: Writeback-aware dynamic cache management for NVM-based main memory system
1. Hardware
  1. Integrated circuits
    1. Semiconductor memory

Recommendations

Temporal-based multilevel correlating inclusive cache replacement

Inclusive caches have been widely used in Chip Multiprocessors (CMPs) to simplify cache coherence. However, they have poor performance compared with noninclusive caches not only because of the limited capacity of the entire cache hierarchy but also due ...
Pseudo-LIFO: the foundation of a new family of replacement policies for last-level caches
MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture

Cache blocks often exhibit a small number of uses during their life time in the last-level cache. Past research has exploited this property in two different ways. First, replacement policies have been designed to evict dead blocks early and retain the ...
Dense Footprint Cache: Capacity-Efficient Die-Stacked DRAM Last Level Cache
MEMSYS '16: Proceedings of the Second International Symposium on Memory Systems

Die-stacked DRAM technology enables a large Last Level Cache (LLC) that provides high bandwidth data access to the processor. However, it requires a large tag array that may take a significant portion of the on-chip SRAM budget. To reduce this SRAM ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization

ACM Transactions on Architecture and Code Optimization Volume 10, Issue 4

December 2013

1046 pages

ISSN:1544-3566

EISSN:1544-3973

DOI:10.1145/2541228

Issue’s Table of Contents

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 December 2013

Accepted: 01 November 2013

Revised: 01 August 2013

Received: 01 June 2013

Published in TACO Volume 10, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Division of Computing and Communication Foundations

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

53
Total Citations
View Citations
1,051
Total Downloads

Downloads (Last 12 months)98
Downloads (Last 6 weeks)14

Reflects downloads up to 02 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Singh DYeung D(2024)MORSE: Memory Overwrite Time Guided Soft Writes to Improve ReRAM Energy and EnduranceProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3676890(26-39)Online publication date: 14-Oct-2024
https://dl.acm.org/doi/10.1145/3656019.3676890
Badri SSaini MGoel N(2023)An Efficient NVM-Based Architecture for Intermittent Computing Under Energy ConstraintsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2023.326655531:6(725-737)Online publication date: Jun-2023
https://doi.org/10.1109/TVLSI.2023.3266555
Urquijo GVoelckers KMeyer ABellinger CAsadinia M(2023)SW-PCM: Graceful Degradation Support in PCM Main Memories by Using Swaption MechanismProceedings of the Future Technologies Conference (FTC) 2023, Volume 310.1007/978-3-031-47457-6_34(514-531)Online publication date: 9-Nov-2023
https://doi.org/10.1007/978-3-031-47457-6_34
Witharana HMishra PMitra TYoung EXiong J(2022)Speculative Load Forwarding Attack on Modern ProcessorsProceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design10.1145/3508352.3549417(1-9)Online publication date: 30-Oct-2022
https://dl.acm.org/doi/10.1145/3508352.3549417
Lee YKwon OHong SOshana R(2022)Don't open rowProceedings of the 59th ACM/IEEE Design Automation Conference10.1145/3489517.3530540(823-828)Online publication date: 10-Jul-2022
https://dl.acm.org/doi/10.1145/3489517.3530540
Liang YChen TChang YHuang YShih W(2022)Planting Fast-Growing Forest by Leveraging the Asymmetric Read/Write Latency of NVRAM-Based SystemsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2021.312668041:10(3304-3317)Online publication date: Oct-2022
https://doi.org/10.1109/TCAD.2021.3126680
Liu MXia LWang YChakrabarty K(2020)Algorithmic Fault Detection for RRAM-based Matrix OperationsACM Transactions on Design Automation of Electronic Systems10.1145/338636025:3(1-31)Online publication date: 13-May-2020
https://dl.acm.org/doi/10.1145/3386360
Mandal SBhat GDoppa JPande POgras U(2020)An Energy-aware Online Learning Framework for Resource Management in Heterogeneous PlatformsACM Transactions on Design Automation of Electronic Systems10.1145/338635925:3(1-26)Online publication date: 13-May-2020
https://dl.acm.org/doi/10.1145/3386359
Nahiyan APark JHe MIskander YFarahmandi FForte DTehranipoor M(2020)SCRIPTACM Transactions on Design Automation of Electronic Systems10.1145/338344525:3(1-27)Online publication date: 13-May-2020
https://dl.acm.org/doi/10.1145/3383445
Chen HPotluri SKoushanfar F(2020)Security of Microfluidic BiochipACM Transactions on Design Automation of Electronic Systems10.1145/338212725:3(1-29)Online publication date: 21-Apr-2020
https://dl.acm.org/doi/10.1145/3382127
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Figures

Tables

Media

View Issue’s Table of Contents