[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2155620.2155670acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

Residue cache: a low-energy low-area L2 cache architecture via compression and partial hits

Published: 03 December 2011 Publication History

Abstract

L2 cache memories are being adopted in the embedded systems for high performance, which, however, increases energy consumption due to their large sizes. We propose a low-energy low-area L2 cache architecture, which performs as well as the conventional L2 cache architecture with 53% less area and around 40% less energy consumption. This architecture consists of an L2 cache and a small cache called residue cache. L2 and residue cache lines are half sized of the conventional L2 cache lines. Well compressed conventional L2 cache lines are stored only in the L2 cache while other poorly compressed lines are stored in both the L2 and residue caches. Although many conventional L2 cache lines are not fully captured by the residue cache, most accesses to them do not incur misses because not all their words are needed immediately, which are termed as partial hits in this paper. The residue cache architecture consumes much lower energy and area than conventional L2 cache architectures, and can be combined synergistically with other schemes such as the line distillation and ZCA. The residue cache architecture is also shown to perform well on a 4-way superscalar processor typically used in high performance systems.

References

[1]
Cacti 6.5. http://www.hpl.hp.com/research/cacti/.
[2]
MIPS32 74K. http://www.mips.com/products/cores/32-64-bit-cores/mips32-74k/.
[3]
Spec2000 benchmarks. http://www.specbench.org/osg/cpu2000.
[4]
B. Abali, H. Franke, X. Shen, D. E. Poff, and T. B. Smith. Performance of hardware compressed main memory. In HPCA, 2001.
[5]
A.-R. Adl-Tabatabai, A. M. Ghuloum, and S. O. Kanaujia. Compression in cache design. In ICS, pages 190--201, 2007.
[6]
A. R. Alameldeen and D. A. Wood. Frequent pattern compression: A significance based compression scheme for L2 caches. Technical report 1500, University of Wisconsin, Madison, Apr. 2004.
[7]
A. R. Alameldeen and D. A. Wood. Adaptive cache compression for high-performance processors. In ISCA, pages 212--223, 2004.
[8]
D. H. Albonesi. Selective cache ways: On-demand cache resource allocation. J. Instruction-Level Parallelism, 2, 2000.
[9]
ARM. Cortex-a processors. http://www.arm.com/products/processors/cortex-a/.
[10]
D. Brooks and M. Martonosi. Dynamically exploiting narrow width operands to improve processor power and performance. In HPCA, pages 13--22, 1999.
[11]
D. C. Burger and T. M. Austin. The simplescalar tool set, version 2.0. Technical Report CS-TR-1997-1342, University of Wisconsin, Madison, June 1997.
[12]
X. Chen, I. Yang, R. P. Dick, L. Shang, and H. Lekatsas. C-pack: A high-performance microprocessor cache compression algorithm. IEEE Trans. VLSI Syst, 18(8):1196--1208, 2010.
[13]
J. Dusser, T. Piquet, and A. Seznec. Zero-content augmented caches. In ICS, pages 46--55, Yorktown Heights, NY, USA, June 2009. ACM Press.
[14]
J. H. Edmondson. Internal organization of the Alpha 21164, a 300-MHz 64-bit quad-issue CMOS RISC microprocessor. Digital Technical Journal of Digital Equipment Corporation, 7(1):119--135, Winter 1995.
[15]
Ekman and Stenstrom. A robust main-memory compression scheme. CANEWS: ACM SIGARCH Computer Architecture News, 33, 2005.
[16]
K. Flautner, N. S. Kim, S. Martin, D. Blaauw, and T. Mudge. Drowsy caches: Techniques for reducing leakage power. In ISCA Computer Architecture News (CAN), Anchorage, AK, 2002.
[17]
K. Ghose and M. B. Kamble. Reducing power in superscalar processor caches using subbanking, multiple line buffers and bit-line segmentation. In ISPLED, pages 70--75, 1999.
[18]
M. Goudarzi and T. Ishihara. SRAM leakage reduction by row/column redundancy under random within-die delay variation. IEEE Trans. VLSI Syst, 18(12):1660--1671, 2010.
[19]
M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the IEEE 4th Annual Workshop on Workload Characterization, Dec. 2001.
[20]
E. G. Hallnor and S. K. Reinhardt. A fully associative software-managed cache design. In ISCA, pages 107--116, 2000.
[21]
E. G. Hallnor and S. K. Reinhardt. A unified compressed memory hierarchy. In HPCA, pages 201--212. IEEE Computer Society, 2005.
[22]
J. L. Hennessy and D. A. Patterson. Computer Architecture: A Quantitative Approach, 4th ed. Morgan Kaufmann Publishers Inc., Sept. 2006.
[23]
Intel. Intel atom processor. http://www.intel.com/technology/atom.
[24]
S. Kaxiras, Z. Hu, and M. Martonosi. Cache decay: exploiting generational behavior to reduce cache leakage power. In ISCA, pages 240--251, 2001.
[25]
D. Kroft. Retrospective: Lockup-free instruction fetch/prefetch cache organization. In 25 Years ISCA: Retrospectives and Reprints, pages 20--21, 1998.
[26]
J.-S. Lee, W.-K. Hong, and S.-D. Kim. An on-chip cache compression technique to reduce decompression overhead and design complexity. Journal of Systems Architecture, 46(15):1365--1382, 2000.
[27]
G. Memik, G. Reinman, and W. H. Mangione-Smith. Just say no: Benefits of early cache miss determinatio. In HPCA, pages 307--316, 2003.
[28]
P. Pujara and A. Aggarwal. Restrictive compression techniques to increase level 1 cache capacity. In ICCD, pages 327--333. IEEE Computer Society, 2005.
[29]
P. Pujara and A. Aggarwal. Increasing the cache efficiency by eliminating noise. In HPCA, pages 145--154. IEEE Computer Society, 2006.
[30]
P. Pujara and A. Aggarwal. Increasing cache capacity through word filtering. In B. J. Smith, editor, ICS, pages 222--231, Seattle, Washington, USA, June 2007.
[31]
M. K. Qureshi, M. A. Suleman, and Y. N. Patt. Line distillation: Increasing cache capacity by filtering unused words in cache lines. In HPCA, pages 250--259. IEEE Computer Society, 2007.
[32]
L. Villa, M. Zhang, and K. Asanovic. Dynamic zero compression for cache energy reduction. In MICRO, pages 214--220, 2000.
[33]
P. R. Wilson, S. F. Kaplan, and Y. Smaragdakis. The case for compressed caching in virtual memory systems. In Proceedings of the USENIX 1999 Annual Technical Conference, pages 101--116, 1999.
[34]
J. Yang and R. Gupta. Energy efficient frequent value data cache design. In MICRO, pages 197--207, 2002.
[35]
J. Yang, Y. Zhang, and R. Gupta. Frequent value compression in data caches. In MICRO, pages 258--265, 2000.

Cited By

View all
  • (2024)Zero and Narrow-Width Value-Aware Compression for Quantized Convolutional Neural NetworksIEEE Transactions on Computers10.1109/TC.2023.331505173:1(249-262)Online publication date: Jan-2024
  • (2024)Enterprise-Class Cache Compression Design2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00080(996-1011)Online publication date: 2-Mar-2024
  • (2022)Exploiting Data Compression for Adaptive Block Placement in Hybrid CachesElectronics10.3390/electronics1102024011:2(240)Online publication date: 12-Jan-2022
  • Show More Cited By

Index Terms

  1. Residue cache: a low-energy low-area L2 cache architecture via compression and partial hits

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MICRO-44: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
    December 2011
    519 pages
    ISBN:9781450310536
    DOI:10.1145/2155620
    • Conference Chair:
    • Carlo Galuzzi,
    • General Chair:
    • Luigi Carro,
    • Program Chairs:
    • Andreas Moshovos,
    • Milos Prvulovic
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 December 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. L2 cache
    2. area
    3. energy

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    MICRO-44
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 484 of 2,242 submissions, 22%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)17
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 09 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Zero and Narrow-Width Value-Aware Compression for Quantized Convolutional Neural NetworksIEEE Transactions on Computers10.1109/TC.2023.331505173:1(249-262)Online publication date: Jan-2024
    • (2024)Enterprise-Class Cache Compression Design2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00080(996-1011)Online publication date: 2-Mar-2024
    • (2022)Exploiting Data Compression for Adaptive Block Placement in Hybrid CachesElectronics10.3390/electronics1102024011:2(240)Online publication date: 12-Jan-2022
    • (2022)ENCORE Compression: Exploiting Narrow-width Values for Quantized Deep Neural Networks2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE54114.2022.9774545(1503-1508)Online publication date: 14-Mar-2022
    • (2022)PR-SSD: Maximizing Partial Read Potential By Exploiting Compression and Channel-Level ParallelismIEEE Transactions on Computers10.1109/TC.2022.3178326(1-1)Online publication date: 2022
    • (2022)Exploiting Inter-block Entropy to Enhance the Compressibility of Blocks with Diverse Data2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00084(1100-1114)Online publication date: Apr-2022
    • (2021)Understanding Cache CompressionACM Transactions on Architecture and Code Optimization10.1145/345720718:3(1-27)Online publication date: 8-Jun-2021
    • (2021)CID: Co-Architecting Instruction Cache and Decompression System for Embedded SystemsIEEE Transactions on Computers10.1109/TC.2020.301006270:7(1132-1145)Online publication date: 1-Jul-2021
    • (2020)SALE: Smartly Allocating Low-Cost Many-Bit ECC for Mitigating Read and Write Errors in STT-RAM CachesIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2020.2977131(1-14)Online publication date: 2020
    • (2018)EARProceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243182(1-11)Online publication date: 1-Nov-2018
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media