[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2597652.2597655acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

Last-level cache deduplication

Published: 10 June 2014 Publication History

Abstract

Caches are essential to the performance of modern micro- processors. Much recent work on last-level caches has focused on exploiting reference locality to improve efficiency. However, value redundancy is another source of potential improvement. We find that many blocks in the working set of typical benchmark programs have the same values. We propose cache deduplication that effectively increases last- level cache capacity. Rather than exploit specific value redundancy with compression, as in previous work, our scheme detects duplicate data blocks and stores only one copy of the data in a way that can be accessed through multiple physical addresses. We find that typical benchmarks exhibit significant value redundancy, far beyond the zero-content blocks one would expect in any program. Our deduplicated cache effectively increases capacity by an average of 112% com- pared to an 8MB last-level cache while reducing the physical area by 12.2%, yielding an average performance improvement of 15.2%.

References

[1]
A.R. Alameldeen and D.A. Wood. Adaptive cache compression for high-performance processors. In Proceedings of the 31st Annual International Symposium on Computer Architecture, pages 212--223. IEEE, 2004.
[2]
A.R. Alameldeen and D.A. Wood. Frequent pattern compression: A significance-based compression schemefor l2 caches. Dept. of Computer Sciences, University of Wisconsin-Madison, Tech. Rep, 2004.
[3]
S. Biswas, D. Franklin, A. Savage, R. Dixon, T. Sherwood, and F.T. Chong. Multi-execution: multicore caching for data-similar executions. In ACM SIGARCH Computer Architecture News, volume 37, pages 164--173. ACM, 2009.
[4]
D. Chen, E. Peserico, and L. Rudolph. A dynamically partitionable compressed cache. In Proceedings of the Singapore-MIT Alliance Symposium, January 2003.
[5]
D. Cheriton, A. Firoozshahian, A. Solomatnikov, J.P. Stevenson, and O. Azizi. Hicamp: architectural support for efficient concurrency-safe shared structured data access. In Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems, pages 287--300. ACM, 2012.
[6]
T.E. Denehy and W.W. Hsu. Duplicate management for reference data. Research Report RJ10305, IBM, 2003.
[7]
L. Domnitser, A. Jaleel, J. Loew, N. Abu-Ghazaleh, and D. Ponomarev. Non-monopolizable caches: Low-complexity mitigation of cache side channel attacks. ACM Transactions on Architecture and Code Optimization (TACO), 8(4):35, 2012.
[8]
J. Dusser, T. Piquet, and A. Seznec. Zero-content augmented caches. In Proceedings of the 23rd international conference on Supercomputing, pages 46--55. ACM, 2009.
[9]
R. W. Green. Memory movement and initialization: Optimization and control. http://software.intel.com/, April 4th, 2013.
[10]
E.G. Hallnor and S.K. Reinhardt. A unified compressed memory hierarchy. In High-Performance Computer Architecture, 2005. HPCA-11. 11th International Symposium on, pages 201--212. IEEE, 2005.
[11]
J.L. Henning. Spec cpu2006 benchmark descriptions. ACM SIGARCH Computer Architecture News, 34(4):1--17, 2006.
[12]
B. Hong, D. Plantenberg, D.D.E. Long, and M. Sivan-Zimet. Duplicate data elimination in a sanfile system. In Proceedings of the 21st Symposium on Mass Storage Systems (MSS'04), Goddard, MD, 2004.
[13]
A. Jaleel, E. Borch, M. Bhandaru, SC Steely, and J. Emer. Achieving non-inclusive cache performance with inclusive caches: Temporal locality aware (tla) cache management policies. In Microarchitecture (MICRO), 2010 43rd Annual IEEE/ACM International Symposium on, pages 151--162. IEEE, 2010.
[14]
A. Jaleel, H.H. Najaf-Abadi, S. Subramaniam, S.C. Steely, and J. Emer. Cruise: cache replacement and utility-aware scheduling. In ACM SIGARCH Computer Architecture News, volume 40, pages 249--260. ACM, 2012.
[15]
S.M. Khan, Y. Tian, and D.A. Jimenez. Sampling dead block prediction for last-level caches. In Microarchitecture (MICRO), 2010 43rd Annual IEEE/ACM International Symposium on, pages 175--186. IEEE, 2010.
[16]
M. Kleanthous and Y. Sazeides. Catch: A mechanism for dynamically detecting cache-content-duplication and its application to instruction caches. In Proceedings of the conference on Design, automation and test in Europe, pages 1426--1431. ACM, 2008.
[17]
P. Koutoupis. Data deduplication with linux. Linux Journal, 2011(207):7, 2011.
[18]
N.A. Kurd, S. Bhamidipati, C. Mozak, J.L. Miller, T.M. Wilson, M. Nemani, and M. Chowdhury. Westmere: A family of 32nm ia processors. In Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2010 IEEE International, pages 96--97. IEEE, 2010.
[19]
J.S. Lee, W.K. Hong, and S.D. Kim. Design and evaluation of a selective compressed memory system. In Computer Design, 1999.(ICCD'99) International Conference on, pages 184--191. IEEE, 1999.
[20]
J.S. Lee, W.K. Hong, and S.D. Kim. Adaptive methods to minimize decompression overhead for compressed on-chip caches. International journal of computers & applications, 25(2):98--105, 2003.
[21]
D. Levinthal. Performance analysis guide for intel core i7 processor and intel xeon 5500 processors. Intel Performance Analysis Guide, 2009.
[22]
C. Molina, C. Aliagas, M. García, A. Gonzàlez, and J. Tubella. Non redundant data cache. In Proceedings of the 2003 international symposium on Low power electronics and design, ISLPED '03, pages 274--277, New York, N.Y., USA, 2003. ACM.
[23]
C.B. Morrey III and D. Grunwald. Content-based block caching. In Proceedings of 23rd IEEE Conference on Mass Storage Systems and Technologies, College Park, Maryland, May 2006.
[24]
N. Muralimanohar, R. Balasubramonian, and N.P. Jouppi. Cacti 6.0: A tool to model large caches. Research report hpl-2009-85, HP Laboratories, 2009.
[25]
R. Pagh and F.F. Rodler. Cuckoo hashing. Journal of Algorithms, 51(2):122--144, 2004.
[26]
A. Patel, F. Afram, and K. Ghose. Marss-x86: A qemu-based micro-architectural and systems simulator for x86 multicore processors. In 1st International Qemu Users' Forum, pages 29--30, 2011.
[27]
G. Pekhimenko, V. Seshadri, Y. Kim, H. Xin, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry. Linearly compressed pages: a low-complexity, low-latency main memory compression framework. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, pages 172--184. ACM, 2013.
[28]
G. Pekhimenko, V. Seshadri, O. Mutlu, T. C. Mowry, P. B. Gibbons, and M. A. Kozuch. Base-delta-immediate compression: A practical data compression mechanism for on-chip caches. In Proceedings of the 21st ACM International Conference on Parallel Architectures and Compilation Techniques (PACT), 2012.
[29]
E. Perelman, G. Hamerly, M. Van Biesbrouck, T. Sherwood, and B. Calder. Using simpoint for accurate and efficient simulation. In Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, SIGMETRICS '03, pages 318--319, New York, N.Y., USA, 2003. ACM.
[30]
M.K. Qureshi, D. Thompson, and Y.N. Patt. The v-way cache: Demand-based associativity via global replacement. In Computer Architecture, 2005. ISCA'05. Proceedings. 32nd International Symposium on, pages 544--555. IEEE, 2005.
[31]
D. Sanchez and C. Kozyrakis. The zcache: Decoupling ways and associativity. In Microarchitecture (MICRO) 2010 43rd Annual IEEE/ACM International Symposium on, pages 187--198. IEEE, 2010.
[32]
R. Sendag and P.F. Chuang. Address correlation: Exceeding the limits of locality. IEEE Comput. Architecture Letters, 1(1):13--16, January 2002.
[33]
O. Seongil, S. Choo, and J.H. Ahn. Exploring energy-efficient dram array organizations. In Circuits and Systems (MWSCAS), 2011 IEEE 54th International Midwest Symposium on, pages 1--4. IEEE, 2011.
[34]
A. Seznec. A case for two-way skewed-associative caches. In ACM SIGARCH Computer Architecture News, volume 21, pages 169--178. ACM, 1993.
[35]
A. Seznec. Analysis of the o-geometric history length branch predictor. In Computer Architecture, 2005. ISCA'05. Proceedings. 32nd International Symposium on, pages 394--405. IEEE, 2005.
[36]
L. Villa, M. Zhang, and K. Asanovic. Dynamic zero compression for cache energy reduction. In Microarchitecture, 2000. MICRO-33. Proceedings. 33rd Annual IEEE/ACM International Symposium on, pages 214--220, 2000.
[37]
D.F. Wendel, R. Kalla, J. Warnock, R. Cargnoni, S.G. Chu, J.G. Clabes, D. Dreps, D. Hrusecky, J. Friedrich, S. Islam, et al. Power7, a highly parallel, scalable multi-core high end server processor. Solid-State Circuits, IEEE Journal of, 46(1):145--161, 2011.
[38]
J. Yang, Y. Zhang, and R. Gupta. Frequent value compression in data caches. In Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture, pages 258--265. ACM, 2000.
[39]
T. Yang, H. Jiang, D. Feng, Z. Niu, K. Zhou, and Y. Wan. Debar: A scalable high-performance de-duplication storage system for backup and archiving. In Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on, pages 1--12. IEEE, 2010.
[40]
Y. Zhang, J. Yang, and R. Gupta. Frequent value locality and value-centric data cache design. In ACM SIGOPS Operating Systems Review, volume 34, pages 150--159. ACM, 2000.

Cited By

View all
  • (2024)AmLuCEP: Amalgamating LUT-based Compression and Adaptive Encoding Assisted Block Placement To Improve Lifetime of PCM-based Main MemoriesACM Transactions on Design Automation of Electronic Systems10.1145/368933429:6(1-24)Online publication date: 20-Aug-2024
  • (2024)Dictionary Based Cache Line CompressionProceedings of the 16th ACM Workshop on Hot Topics in Storage and File Systems10.1145/3655038.3665941(8-14)Online publication date: 8-Jul-2024
  • (2024)CKSM: An Efficient Memory Deduplication Method for Container-based Cloud Computing Systems2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS57955.2024.00016(76-88)Online publication date: 27-May-2024
  • Show More Cited By

Index Terms

  1. Last-level cache deduplication

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICS '14: Proceedings of the 28th ACM international conference on Supercomputing
    June 2014
    378 pages
    ISBN:9781450326421
    DOI:10.1145/2597652
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 June 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. cache deduplication
    2. last-level caches

    Qualifiers

    • Research-article

    Conference

    ICS'14
    Sponsor:

    Acceptance Rates

    ICS '14 Paper Acceptance Rate 34 of 160 submissions, 21%;
    Overall Acceptance Rate 629 of 2,180 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)58
    • Downloads (Last 6 weeks)10
    Reflects downloads up to 10 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)AmLuCEP: Amalgamating LUT-based Compression and Adaptive Encoding Assisted Block Placement To Improve Lifetime of PCM-based Main MemoriesACM Transactions on Design Automation of Electronic Systems10.1145/368933429:6(1-24)Online publication date: 20-Aug-2024
    • (2024)Dictionary Based Cache Line CompressionProceedings of the 16th ACM Workshop on Hot Topics in Storage and File Systems10.1145/3655038.3665941(8-14)Online publication date: 8-Jul-2024
    • (2024)CKSM: An Efficient Memory Deduplication Method for Container-based Cloud Computing Systems2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS57955.2024.00016(76-88)Online publication date: 27-May-2024
    • (2024)Enterprise-Class Cache Compression Design2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00080(996-1011)Online publication date: 2-Mar-2024
    • (2024)PFDup: Practical Fuzzy Deduplication for Encrypted Multimedia DataJournal of Industrial Information Integration10.1016/j.jii.2024.10061340(100613)Online publication date: Jul-2024
    • (2024)Convergent encryption enabled secure data deduplication algorithm for cloud environmentConcurrency and Computation: Practice and Experience10.1002/cpe.820536:21Online publication date: 21-Jun-2024
    • (2023)Baryon: Efficient Hybrid Memory Management with Compression and Sub-Blocking2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071115(137-151)Online publication date: Feb-2023
    • (2023)Silo: Speculative Hardware Logging for Atomic Durability in Persistent Memory2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071034(651-663)Online publication date: Feb-2023
    • (2023)Comparative Analysis of Image Augmentation and Data Deduplication TechniquesSmart Trends in Computing and Communications10.1007/978-981-99-0769-4_26(271-281)Online publication date: 15-Jun-2023
    • (2022)Secure Single-Server Fuzzy Deduplication without Interactive Proof-of-Ownership in Cloud2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys57074.2022.00234(1525-1530)Online publication date: Dec-2022
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media