More Web Proxy on the site http://driver.im/

research-article

Last-level cache deduplication

Authors:

Samira M. Khan,

Daniel A. Jiménez,

Gabriel H. LohAuthors Info & Claims

ICS '14: Proceedings of the 28th ACM international conference on Supercomputing

Pages 53 - 62

https://doi.org/10.1145/2597652.2597655

Published: 10 June 2014 Publication History

Abstract

Caches are essential to the performance of modern micro- processors. Much recent work on last-level caches has focused on exploiting reference locality to improve efficiency. However, value redundancy is another source of potential improvement. We find that many blocks in the working set of typical benchmark programs have the same values. We propose cache deduplication that effectively increases last- level cache capacity. Rather than exploit specific value redundancy with compression, as in previous work, our scheme detects duplicate data blocks and stores only one copy of the data in a way that can be accessed through multiple physical addresses. We find that typical benchmarks exhibit significant value redundancy, far beyond the zero-content blocks one would expect in any program. Our deduplicated cache effectively increases capacity by an average of 112% com- pared to an 8MB last-level cache while reducing the physical area by 12.2%, yielding an average performance improvement of 15.2%.

References

[1]

A.R. Alameldeen and D.A. Wood. Adaptive cache compression for high-performance processors. In Proceedings of the 31st Annual International Symposium on Computer Architecture, pages 212--223. IEEE, 2004.

Digital Library

[2]

A.R. Alameldeen and D.A. Wood. Frequent pattern compression: A significance-based compression schemefor l2 caches. Dept. of Computer Sciences, University of Wisconsin-Madison, Tech. Rep, 2004.

[3]

S. Biswas, D. Franklin, A. Savage, R. Dixon, T. Sherwood, and F.T. Chong. Multi-execution: multicore caching for data-similar executions. In ACM SIGARCH Computer Architecture News, volume 37, pages 164--173. ACM, 2009.

Digital Library

[4]

D. Chen, E. Peserico, and L. Rudolph. A dynamically partitionable compressed cache. In Proceedings of the Singapore-MIT Alliance Symposium, January 2003.

[5]

D. Cheriton, A. Firoozshahian, A. Solomatnikov, J.P. Stevenson, and O. Azizi. Hicamp: architectural support for efficient concurrency-safe shared structured data access. In Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems, pages 287--300. ACM, 2012.

Digital Library

[6]

T.E. Denehy and W.W. Hsu. Duplicate management for reference data. Research Report RJ10305, IBM, 2003.

[7]

L. Domnitser, A. Jaleel, J. Loew, N. Abu-Ghazaleh, and D. Ponomarev. Non-monopolizable caches: Low-complexity mitigation of cache side channel attacks. ACM Transactions on Architecture and Code Optimization (TACO), 8(4):35, 2012.

Digital Library

[8]

J. Dusser, T. Piquet, and A. Seznec. Zero-content augmented caches. In Proceedings of the 23rd international conference on Supercomputing, pages 46--55. ACM, 2009.

Digital Library

[9]

R. W. Green. Memory movement and initialization: Optimization and control. http://software.intel.com/, April 4th, 2013.

[10]

E.G. Hallnor and S.K. Reinhardt. A unified compressed memory hierarchy. In High-Performance Computer Architecture, 2005. HPCA-11. 11th International Symposium on, pages 201--212. IEEE, 2005.

Digital Library

[11]

J.L. Henning. Spec cpu2006 benchmark descriptions. ACM SIGARCH Computer Architecture News, 34(4):1--17, 2006.

Digital Library

[12]

B. Hong, D. Plantenberg, D.D.E. Long, and M. Sivan-Zimet. Duplicate data elimination in a sanfile system. In Proceedings of the 21st Symposium on Mass Storage Systems (MSS'04), Goddard, MD, 2004.

[13]

A. Jaleel, E. Borch, M. Bhandaru, SC Steely, and J. Emer. Achieving non-inclusive cache performance with inclusive caches: Temporal locality aware (tla) cache management policies. In Microarchitecture (MICRO), 2010 43rd Annual IEEE/ACM International Symposium on, pages 151--162. IEEE, 2010.

Digital Library

[14]

A. Jaleel, H.H. Najaf-Abadi, S. Subramaniam, S.C. Steely, and J. Emer. Cruise: cache replacement and utility-aware scheduling. In ACM SIGARCH Computer Architecture News, volume 40, pages 249--260. ACM, 2012.

Digital Library

[15]

S.M. Khan, Y. Tian, and D.A. Jimenez. Sampling dead block prediction for last-level caches. In Microarchitecture (MICRO), 2010 43rd Annual IEEE/ACM International Symposium on, pages 175--186. IEEE, 2010.

Digital Library

[16]

M. Kleanthous and Y. Sazeides. Catch: A mechanism for dynamically detecting cache-content-duplication and its application to instruction caches. In Proceedings of the conference on Design, automation and test in Europe, pages 1426--1431. ACM, 2008.

Digital Library

[17]

P. Koutoupis. Data deduplication with linux. Linux Journal, 2011(207):7, 2011.

Digital Library

[18]

N.A. Kurd, S. Bhamidipati, C. Mozak, J.L. Miller, T.M. Wilson, M. Nemani, and M. Chowdhury. Westmere: A family of 32nm ia processors. In Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2010 IEEE International, pages 96--97. IEEE, 2010.

[19]

J.S. Lee, W.K. Hong, and S.D. Kim. Design and evaluation of a selective compressed memory system. In Computer Design, 1999.(ICCD'99) International Conference on, pages 184--191. IEEE, 1999.

Digital Library

[20]

J.S. Lee, W.K. Hong, and S.D. Kim. Adaptive methods to minimize decompression overhead for compressed on-chip caches. International journal of computers & applications, 25(2):98--105, 2003.

[21]

D. Levinthal. Performance analysis guide for intel core i7 processor and intel xeon 5500 processors. Intel Performance Analysis Guide, 2009.

[22]

C. Molina, C. Aliagas, M. García, A. Gonzàlez, and J. Tubella. Non redundant data cache. In Proceedings of the 2003 international symposium on Low power electronics and design, ISLPED '03, pages 274--277, New York, N.Y., USA, 2003. ACM.

Digital Library

[23]

C.B. Morrey III and D. Grunwald. Content-based block caching. In Proceedings of 23rd IEEE Conference on Mass Storage Systems and Technologies, College Park, Maryland, May 2006.

[24]

N. Muralimanohar, R. Balasubramonian, and N.P. Jouppi. Cacti 6.0: A tool to model large caches. Research report hpl-2009-85, HP Laboratories, 2009.

[25]

R. Pagh and F.F. Rodler. Cuckoo hashing. Journal of Algorithms, 51(2):122--144, 2004.

Digital Library

[26]

A. Patel, F. Afram, and K. Ghose. Marss-x86: A qemu-based micro-architectural and systems simulator for x86 multicore processors. In 1st International Qemu Users' Forum, pages 29--30, 2011.

[27]

G. Pekhimenko, V. Seshadri, Y. Kim, H. Xin, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry. Linearly compressed pages: a low-complexity, low-latency main memory compression framework. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, pages 172--184. ACM, 2013.

Digital Library

[28]

G. Pekhimenko, V. Seshadri, O. Mutlu, T. C. Mowry, P. B. Gibbons, and M. A. Kozuch. Base-delta-immediate compression: A practical data compression mechanism for on-chip caches. In Proceedings of the 21st ACM International Conference on Parallel Architectures and Compilation Techniques (PACT), 2012.

Digital Library

[29]

E. Perelman, G. Hamerly, M. Van Biesbrouck, T. Sherwood, and B. Calder. Using simpoint for accurate and efficient simulation. In Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, SIGMETRICS '03, pages 318--319, New York, N.Y., USA, 2003. ACM.

Digital Library

[30]

M.K. Qureshi, D. Thompson, and Y.N. Patt. The v-way cache: Demand-based associativity via global replacement. In Computer Architecture, 2005. ISCA'05. Proceedings. 32nd International Symposium on, pages 544--555. IEEE, 2005.

Digital Library

[31]

D. Sanchez and C. Kozyrakis. The zcache: Decoupling ways and associativity. In Microarchitecture (MICRO) 2010 43rd Annual IEEE/ACM International Symposium on, pages 187--198. IEEE, 2010.

Digital Library

[32]

R. Sendag and P.F. Chuang. Address correlation: Exceeding the limits of locality. IEEE Comput. Architecture Letters, 1(1):13--16, January 2002.

Digital Library

[33]

O. Seongil, S. Choo, and J.H. Ahn. Exploring energy-efficient dram array organizations. In Circuits and Systems (MWSCAS), 2011 IEEE 54th International Midwest Symposium on, pages 1--4. IEEE, 2011.

[34]

A. Seznec. A case for two-way skewed-associative caches. In ACM SIGARCH Computer Architecture News, volume 21, pages 169--178. ACM, 1993.

Digital Library

[35]

A. Seznec. Analysis of the o-geometric history length branch predictor. In Computer Architecture, 2005. ISCA'05. Proceedings. 32nd International Symposium on, pages 394--405. IEEE, 2005.

Digital Library

[36]

L. Villa, M. Zhang, and K. Asanovic. Dynamic zero compression for cache energy reduction. In Microarchitecture, 2000. MICRO-33. Proceedings. 33rd Annual IEEE/ACM International Symposium on, pages 214--220, 2000.

Digital Library

[37]

D.F. Wendel, R. Kalla, J. Warnock, R. Cargnoni, S.G. Chu, J.G. Clabes, D. Dreps, D. Hrusecky, J. Friedrich, S. Islam, et al. Power7, a highly parallel, scalable multi-core high end server processor. Solid-State Circuits, IEEE Journal of, 46(1):145--161, 2011.

[38]

J. Yang, Y. Zhang, and R. Gupta. Frequent value compression in data caches. In Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture, pages 258--265. ACM, 2000.

Digital Library

[39]

T. Yang, H. Jiang, D. Feng, Z. Niu, K. Zhou, and Y. Wan. Debar: A scalable high-performance de-duplication storage system for backup and archiving. In Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on, pages 1--12. IEEE, 2010.

[40]

Y. Zhang, J. Yang, and R. Gupta. Frequent value locality and value-centric data cache design. In ACM SIGOPS Operating Systems Review, volume 34, pages 150--159. ACM, 2000.

Digital Library

Cited By

Nath AKapoor H(2024)AmLuCEP: Amalgamating LUT-based Compression and Adaptive Encoding Assisted Block Placement To Improve Lifetime of PCM-based Main MemoriesACM Transactions on Design Automation of Electronic Systems10.1145/368933429:6(1-24)Online publication date: 20-Aug-2024
https://dl.acm.org/doi/10.1145/3689334
Cohen DCohen SNaor DWaddington DHershcovitch M(2024)Dictionary Based Cache Line CompressionProceedings of the 16th ACM Workshop on Hot Topics in Storage and File Systems10.1145/3655038.3665941(8-14)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3655038.3665941
Gu YLu YWu CLi JGuo M(2024)CKSM: An Efficient Memory Deduplication Method for Container-based Cloud Computing Systems2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS57955.2024.00016(76-88)Online publication date: 27-May-2024
https://doi.org/10.1109/IPDPS57955.2024.00016
Show More Cited By

Index Terms

Last-level cache deduplication
1. Hardware
  1. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory

Recommendations

Introducing hierarchy-awareness in replacement and bypass algorithms for last-level caches
PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques

The replacement policies for the last-level caches (LLCs) are usually designed based on the access information available locally at the LLC. These policies are inherently sub-optimal due to lack of information about the activities in the inner-levels of ...
Adaptive Cache Bypassing for Inclusive Last Level Caches
IPDPS '13: Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing

Cache hierarchy designs, including bypassing, replacement, and the inclusion property, have significant performance impact. Recent works on high performance caches have shown that cache bypassing is an effective technique to enhance the last level cache ...
A new cache replacement algorithm for last-level caches by exploiting tag-distance correlation of cache lines

Cache memory plays a crucial role in determining the performance of processors, especially for embedded processors where area and power are tightly constrained. It is necessary to have effective management mechanisms, such as cache replacement policies, ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICS '14: Proceedings of the 28th ACM international conference on Supercomputing

June 2014

378 pages

ISBN:9781450326421

DOI:10.1145/2597652

General Chairs:
Arndt Bode
Technische Universität München and Leibniz Rechenzentrum, Germany
,
Michael Gerndt
Technische Universität München, Germany
,
Program Chairs:
Per Stenström
Chalmers University of Technology, Sweden
,
Lawrence Rauchwerger
Texas A&M University, USA
,
Barton Miller
University of Wisconsin, USA
,
Martin Schulz
Lawrence Livermore National Laboratory, USA

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 June 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICS'14

Sponsor:

SIGARCH

ICS'14: 2014 International Conference on Supercomputing

June 10 - 13, 2014

Munich, Germany

Acceptance Rates

ICS '14 Paper Acceptance Rate 34 of 160 submissions, 21%;

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

48
Total Citations
View Citations
1,136
Total Downloads

Downloads (Last 12 months)58
Downloads (Last 6 weeks)10

Reflects downloads up to 10 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Nath AKapoor H(2024)AmLuCEP: Amalgamating LUT-based Compression and Adaptive Encoding Assisted Block Placement To Improve Lifetime of PCM-based Main MemoriesACM Transactions on Design Automation of Electronic Systems10.1145/368933429:6(1-24)Online publication date: 20-Aug-2024
https://dl.acm.org/doi/10.1145/3689334
Cohen DCohen SNaor DWaddington DHershcovitch M(2024)Dictionary Based Cache Line CompressionProceedings of the 16th ACM Workshop on Hot Topics in Storage and File Systems10.1145/3655038.3665941(8-14)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3655038.3665941
Gu YLu YWu CLi JGuo M(2024)CKSM: An Efficient Memory Deduplication Method for Container-based Cloud Computing Systems2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS57955.2024.00016(76-88)Online publication date: 27-May-2024
https://doi.org/10.1109/IPDPS57955.2024.00016
Buyuktosunoglu ATrilla DAbali BBerger DWalters CLee J(2024)Enterprise-Class Cache Compression Design2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00080(996-1011)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00080
Cheng STang ZZeng SCui XLi T(2024)PFDup: Practical Fuzzy Deduplication for Encrypted Multimedia DataJournal of Industrial Information Integration10.1016/j.jii.2024.10061340(100613)Online publication date: Jul-2024
https://doi.org/10.1016/j.jii.2024.100613
Ahmad SArif MAhmad JNazim MMehfuz S(2024)Convergent encryption enabled secure data deduplication algorithm for cloud environmentConcurrency and Computation: Practice and Experience10.1002/cpe.820536:21Online publication date: 21-Jun-2024
https://doi.org/10.1002/cpe.8205
Li YGao M(2023)Baryon: Efficient Hybrid Memory Management with Compression and Sub-Blocking2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071115(137-151)Online publication date: Feb-2023
https://doi.org/10.1109/HPCA56546.2023.10071115
Zhang MHua Y(2023)Silo: Speculative Hardware Logging for Atomic Durability in Persistent Memory2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071034(651-663)Online publication date: Feb-2023
https://doi.org/10.1109/HPCA56546.2023.10071034
Vij PDalip (2023)Comparative Analysis of Image Augmentation and Data Deduplication TechniquesSmart Trends in Computing and Communications10.1007/978-981-99-0769-4_26(271-281)Online publication date: 15-Jun-2023
https://doi.org/10.1007/978-981-99-0769-4_26
Cheng SZeng SFeng YXiao JZheng H(2022)Secure Single-Server Fuzzy Deduplication without Interactive Proof-of-Ownership in Cloud2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys57074.2022.00234(1525-1530)Online publication date: Dec-2022
https://doi.org/10.1109/HPCC-DSS-SmartCity-DependSys57074.2022.00234
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents