More Web Proxy on the site http://driver.im/

research-article

Residue cache: a low-energy low-area L2 cache architecture via compression and partial hits

Authors:

Seokin HongAuthors Info & Claims

MICRO-44: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture

Pages 420 - 429

https://doi.org/10.1145/2155620.2155670

Published: 03 December 2011 Publication History

Abstract

L2 cache memories are being adopted in the embedded systems for high performance, which, however, increases energy consumption due to their large sizes. We propose a low-energy low-area L2 cache architecture, which performs as well as the conventional L2 cache architecture with 53% less area and around 40% less energy consumption. This architecture consists of an L2 cache and a small cache called residue cache. L2 and residue cache lines are half sized of the conventional L2 cache lines. Well compressed conventional L2 cache lines are stored only in the L2 cache while other poorly compressed lines are stored in both the L2 and residue caches. Although many conventional L2 cache lines are not fully captured by the residue cache, most accesses to them do not incur misses because not all their words are needed immediately, which are termed as partial hits in this paper. The residue cache architecture consumes much lower energy and area than conventional L2 cache architectures, and can be combined synergistically with other schemes such as the line distillation and ZCA. The residue cache architecture is also shown to perform well on a 4-way superscalar processor typically used in high performance systems.

References

[1]

Cacti 6.5. http://www.hpl.hp.com/research/cacti/.

[2]

MIPS32 74K. http://www.mips.com/products/cores/32-64-bit-cores/mips32-74k/.

[3]

Spec2000 benchmarks. http://www.specbench.org/osg/cpu2000.

[4]

B. Abali, H. Franke, X. Shen, D. E. Poff, and T. B. Smith. Performance of hardware compressed main memory. In HPCA, 2001.

Digital Library

[5]

A.-R. Adl-Tabatabai, A. M. Ghuloum, and S. O. Kanaujia. Compression in cache design. In ICS, pages 190--201, 2007.

Digital Library

[6]

A. R. Alameldeen and D. A. Wood. Frequent pattern compression: A significance based compression scheme for L2 caches. Technical report 1500, University of Wisconsin, Madison, Apr. 2004.

[7]

A. R. Alameldeen and D. A. Wood. Adaptive cache compression for high-performance processors. In ISCA, pages 212--223, 2004.

Digital Library

[8]

D. H. Albonesi. Selective cache ways: On-demand cache resource allocation. J. Instruction-Level Parallelism, 2, 2000.

[9]

ARM. Cortex-a processors. http://www.arm.com/products/processors/cortex-a/.

[10]

D. Brooks and M. Martonosi. Dynamically exploiting narrow width operands to improve processor power and performance. In HPCA, pages 13--22, 1999.

Digital Library

[11]

D. C. Burger and T. M. Austin. The simplescalar tool set, version 2.0. Technical Report CS-TR-1997-1342, University of Wisconsin, Madison, June 1997.

Digital Library

[12]

X. Chen, I. Yang, R. P. Dick, L. Shang, and H. Lekatsas. C-pack: A high-performance microprocessor cache compression algorithm. IEEE Trans. VLSI Syst, 18(8):1196--1208, 2010.

Digital Library

[13]

J. Dusser, T. Piquet, and A. Seznec. Zero-content augmented caches. In ICS, pages 46--55, Yorktown Heights, NY, USA, June 2009. ACM Press.

Digital Library

[14]

J. H. Edmondson. Internal organization of the Alpha 21164, a 300-MHz 64-bit quad-issue CMOS RISC microprocessor. Digital Technical Journal of Digital Equipment Corporation, 7(1):119--135, Winter 1995.

Digital Library

[15]

Ekman and Stenstrom. A robust main-memory compression scheme. CANEWS: ACM SIGARCH Computer Architecture News, 33, 2005.

Digital Library

[16]

K. Flautner, N. S. Kim, S. Martin, D. Blaauw, and T. Mudge. Drowsy caches: Techniques for reducing leakage power. In ISCA Computer Architecture News (CAN), Anchorage, AK, 2002.

Digital Library

[17]

K. Ghose and M. B. Kamble. Reducing power in superscalar processor caches using subbanking, multiple line buffers and bit-line segmentation. In ISPLED, pages 70--75, 1999.

Digital Library

[18]

M. Goudarzi and T. Ishihara. SRAM leakage reduction by row/column redundancy under random within-die delay variation. IEEE Trans. VLSI Syst, 18(12):1660--1671, 2010.

Digital Library

[19]

M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the IEEE 4th Annual Workshop on Workload Characterization, Dec. 2001.

Digital Library

[20]

E. G. Hallnor and S. K. Reinhardt. A fully associative software-managed cache design. In ISCA, pages 107--116, 2000.

Digital Library

[21]

E. G. Hallnor and S. K. Reinhardt. A unified compressed memory hierarchy. In HPCA, pages 201--212. IEEE Computer Society, 2005.

Digital Library

[22]

J. L. Hennessy and D. A. Patterson. Computer Architecture: A Quantitative Approach, 4th ed. Morgan Kaufmann Publishers Inc., Sept. 2006.

Digital Library

[23]

Intel. Intel atom processor. http://www.intel.com/technology/atom.

[24]

S. Kaxiras, Z. Hu, and M. Martonosi. Cache decay: exploiting generational behavior to reduce cache leakage power. In ISCA, pages 240--251, 2001.

Digital Library

[25]

D. Kroft. Retrospective: Lockup-free instruction fetch/prefetch cache organization. In 25 Years ISCA: Retrospectives and Reprints, pages 20--21, 1998.

Digital Library

[26]

J.-S. Lee, W.-K. Hong, and S.-D. Kim. An on-chip cache compression technique to reduce decompression overhead and design complexity. Journal of Systems Architecture, 46(15):1365--1382, 2000.

Digital Library

[27]

G. Memik, G. Reinman, and W. H. Mangione-Smith. Just say no: Benefits of early cache miss determinatio. In HPCA, pages 307--316, 2003.

Digital Library

[28]

P. Pujara and A. Aggarwal. Restrictive compression techniques to increase level 1 cache capacity. In ICCD, pages 327--333. IEEE Computer Society, 2005.

Digital Library

[29]

P. Pujara and A. Aggarwal. Increasing the cache efficiency by eliminating noise. In HPCA, pages 145--154. IEEE Computer Society, 2006.

[30]

P. Pujara and A. Aggarwal. Increasing cache capacity through word filtering. In B. J. Smith, editor, ICS, pages 222--231, Seattle, Washington, USA, June 2007.

Digital Library

[31]

M. K. Qureshi, M. A. Suleman, and Y. N. Patt. Line distillation: Increasing cache capacity by filtering unused words in cache lines. In HPCA, pages 250--259. IEEE Computer Society, 2007.

Digital Library

[32]

L. Villa, M. Zhang, and K. Asanovic. Dynamic zero compression for cache energy reduction. In MICRO, pages 214--220, 2000.

Digital Library

[33]

P. R. Wilson, S. F. Kaplan, and Y. Smaragdakis. The case for compressed caching in virtual memory systems. In Proceedings of the USENIX 1999 Annual Technical Conference, pages 101--116, 1999.

Digital Library

[34]

J. Yang and R. Gupta. Energy efficient frequent value data cache design. In MICRO, pages 197--207, 2002.

Digital Library

[35]

J. Yang, Y. Zhang, and R. Gupta. Frequent value compression in data caches. In MICRO, pages 258--265, 2000.

Digital Library

Cited By

Jang MKim JNam HKim S(2024)Zero and Narrow-Width Value-Aware Compression for Quantized Convolutional Neural NetworksIEEE Transactions on Computers10.1109/TC.2023.331505173:1(249-262)Online publication date: Jan-2024
https://doi.org/10.1109/TC.2023.3315051
Buyuktosunoglu ATrilla DAbali BBerger DWalters CLee J(2024)Enterprise-Class Cache Compression Design2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00080(996-1011)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00080
Kim BKim YNair PHong S(2022)Exploiting Data Compression for Adaptive Block Placement in Hybrid CachesElectronics10.3390/electronics1102024011:2(240)Online publication date: 12-Jan-2022
https://doi.org/10.3390/electronics11020240
Show More Cited By

Index Terms

Residue cache: a low-energy low-area L2 cache architecture via compression and partial hits
1. Hardware
  1. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory

Recommendations

A reusability-aware cache memory sharing technique for high-performance low-power CMPs with private L2 caches
ISLPED '07: Proceedings of the 2007 international symposium on Low power electronics and design

Chip multiprocessors (CMPs) emerge as a dominant architectural alternative in high-end embedded systems. Since off-chip accesses require a long latency and consume a large amount of power, CMPs are typically based on multiple levels of on-chip cache ...
Domino Cache: An Energy-Efficient Data Cache for Modern Applications

The energy consumption for processing modern workloads is challenging in data centers. Due to the large datasets of cloud workloads, the miss rate of the L1 data cache is high, and with respect to the energy efficiency concerns, such misses are costly ...
The filter cache: an energy efficient memory structure
MICRO 30: Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture

Most modern microprocessors employ one or two levels of on-chip caches in order to improve performance. These caches are typically implemented with static RAM cells and often occupy a large portion of the chip area. Not surprisingly, these caches often ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MICRO-44: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture

December 2011

519 pages

ISBN:9781450310536

DOI:10.1145/2155620

Conference Chair:
Carlo Galuzzi
Technische Universiteit Delft, The Netherlands
,
General Chair:
Luigi Carro
Universidade Federal do Rio Grande do Sul, Brasil
,
Program Chairs:
Andreas Moshovos
University of Toronto, Canada
,
Milos Prvulovic
Georgia Institute of Technology, United States

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

IEEE
ACM: Association for Computing Machinery
UFRGS: Universidade Federal do Rio Grande do Sul
SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing
IEEE-CS: Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 December 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Research Foundation of Korea

Conference

MICRO-44

Sponsor:

ACM
UFRGS
SIGMICRO
IEEE-CS

MICRO-44: The 44th Annual IEEE/ACM International Symposium on Microarchitecture

December 3 - 7, 2011

Porto Alegre, Brazil

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

34
Total Citations
View Citations
943
Total Downloads

Downloads (Last 12 months)17
Downloads (Last 6 weeks)2

Reflects downloads up to 09 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Jang MKim JNam HKim S(2024)Zero and Narrow-Width Value-Aware Compression for Quantized Convolutional Neural NetworksIEEE Transactions on Computers10.1109/TC.2023.331505173:1(249-262)Online publication date: Jan-2024
https://doi.org/10.1109/TC.2023.3315051
Buyuktosunoglu ATrilla DAbali BBerger DWalters CLee J(2024)Enterprise-Class Cache Compression Design2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00080(996-1011)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00080
Kim BKim YNair PHong S(2022)Exploiting Data Compression for Adaptive Block Placement in Hybrid CachesElectronics10.3390/electronics1102024011:2(240)Online publication date: 12-Jan-2022
https://doi.org/10.3390/electronics11020240
Jang MKim JKim JKim S(2022)ENCORE Compression: Exploiting Narrow-width Values for Quantized Deep Neural Networks2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE54114.2022.9774545(1503-1508)Online publication date: 14-Mar-2022
https://doi.org/10.23919/DATE54114.2022.9774545
Kang MLee WKim JKim S(2022)PR-SSD: Maximizing Partial Read Potential By Exploiting Compression and Channel-Level ParallelismIEEE Transactions on Computers10.1109/TC.2022.3178326(1-1)Online publication date: 2022
https://doi.org/10.1109/TC.2022.3178326
Kim JKang MHong JKim S(2022)Exploiting Inter-block Entropy to Enhance the Compressibility of Blocks with Diverse Data2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00084(1100-1114)Online publication date: Apr-2022
https://doi.org/10.1109/HPCA53966.2022.00084
Carvalho DSeznec A(2021)Understanding Cache CompressionACM Transactions on Architecture and Code Optimization10.1145/345720718:3(1-27)Online publication date: 8-Jun-2021
https://dl.acm.org/doi/10.1145/3457207
Kim JHong SHong JKim S(2021)CID: Co-Architecting Instruction Cache and Decompression System for Embedded SystemsIEEE Transactions on Computers10.1109/TC.2020.301006270:7(1132-1145)Online publication date: 1-Jul-2021
https://doi.org/10.1109/TC.2020.3010062
Qureshi MPark JKim S(2020)SALE: Smartly Allocating Low-Cost Many-Bit ECC for Mitigating Read and Write Errors in STT-RAM CachesIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2020.2977131(1-14)Online publication date: 2020
https://doi.org/10.1109/TVLSI.2020.2977131
Hong JKim HKim SEvripidou SStenström PO'Boyle M(2018)EARProceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243182(1-11)Online publication date: 1-Nov-2018
https://dl.acm.org/doi/10.1145/3243176.3243182
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents