[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2830772.2830828acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article
Public Access

MORC: a manycore-oriented compressed cache

Published: 05 December 2015 Publication History

Abstract

Cache compression has largely focused on improving single-stream application performance. In contrast, this work proposes utilizing cache compression to improve application throughput for manycore processors while potentially harming single-stream performance. The growing interest in throughput-oriented manycore architectures and widening disparity between on-chip resources and off-chip bandwidth motivate re-evaluation of utilizing costly compression to conserve off-chip memory bandwidth. This work proposes MORC, a Many-core ORiented Compressed Cache architecture that compresses hundreds of cache lines together to maximize compression ratio. By looking across cache lines, MORC is able to achieve compression ratios beyond compression schemes which only compress within a single cache line. MORC utilizes a novel log-based cache organization which selects cache lines that are filled into the cache close in time as candidates to compress together. The proposed design not only compresses cache data, but also cache tags together to further save storage. Future manycore processors will likely have reduced cache sizes and less bandwidth per core than current multicore processors. We evaluate MORC on such future many-core processors utilizing the SPEC2006 benchmark suite. We find that MORC offers 37% more throughput than uncompressed caches and 17% more throughput than the next best cache compression scheme, while simultaneously reducing 17% of memory system energy compared to uncompressed caches.

References

[1]
V. W. Lee, C. Kim, J. Chhugani, M. Deisher, et al., "Debunking the 100x GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU," in ACM SIGARCH Computer Architecture News, vol. 38, pp. 451--460, ACM, 2010.
[2]
Y.-K. Chen, J. Chhugani, P. Dubey, et al., "Convergence of recognition, mining, and synthesis workloads and its implications," Proceedings of the IEEE, vol. 96, no. 5, pp. 790--807, 2008.
[3]
J. Jeffers and J. Reinders, Intel Xeon Phi coprocessor high performance programming. Newnes, 2013.
[4]
L. Seiler, D. Carmean, E. Sprangle, T. Forsyth, M. Abrash, P. Dubey, S. Junkins, A. Lake, J. Sugerman, R. Cavin, et al., "Larrabee: a many-core x86 architecture for visual computing," ACM Transactions on Graphics (TOG), vol. 27, no. 3, p. 18, 2008.
[5]
S. R. Vangal et al., "An 80-tile sub-100-w teraflops processor in 65-nm cmos," Solid-State Circuits, IEEE Journal of, vol. 43, no. 1, pp. 29--41, 2008.
[6]
J. S. Kim, M. B. Taylor, J. Miller, and D. Wentzlaff, "Energy characterization of a tiled architecture processor with on-chip networks," in Proceedings of international symposium on Low power electronics and design, pp. 424--427, ACM, 2003.
[7]
D. Wentzlaff, P. Griffin, H. Hoffmann, L. Bao, B. Edwards, C. Ramey, M. Mattina, C.-C. Miao, J. F. Brown III, and A. Agarwal, "On-chip interconnection architecture of the Tile Processor," IEEE Micro, vol. 27, pp. 15--31, Sept. 2007.
[8]
S. Bell et al., "Tile64 - processor: A 64-core soc with mesh interconnect," in Solid-State Circuits Conference, 2008. ISSCC 2008. Digest of Technical Papers. IEEE International, pp. 88--598, Feb 2008.
[9]
C. Ramey, "Tile-gx100 manycore processor: Acceleration interfaces and architecture," in Proceedings of Hot Chips Symposium, 2011.
[10]
G. E. Moore, "Cramming More Components Onto Integrated Circuits," Electronics, Apr. 1965.
[11]
D. Burger, J. R. Goodman, and A. Kägi, Memory bandwidth limitations of future microprocessors, vol. 24. ACM, 1996.
[12]
P.-J. Chuang, M. Sachdev, and V. Gaudet, "A 167-ps 2.34-mW Single-Cycle 64-Bit Binary Tree Comparator With Constant-Delay Logic in 65-nm CMOS," Circuits and Systems I: Regular Papers, IEEE Transactions on, vol. 61, pp. 160--171, Jan 2014.
[13]
S. Thoziyoor, N. Muralimanohar, and N. P. Jouppi, "CACTI 5.0," HP Laboratories, Technical Report, 2007.
[14]
S. Galal and M. Horowitz, "Energy-efficient floating-point unit design," Computers, IEEE Transactions on, vol. 60, no. 7, pp. 913--922, 2011.
[15]
R. Ho, K. W. Mai, and M. A. Horowitz, "The future of wires," Proceedings of the IEEE, vol. 89, no. 4, pp. 490--504, 2001.
[16]
TECHNICK.NET, "PCB Impedance Calculator," http://www.technick.net/public/code/cp_dpage.php?aiocp_dp=util_pcb_imp_microstrip.
[17]
Micron Technology, "DDR3 System-Power Calculator," www.micron.com/support/power-calc.
[18]
A. R. Alameldeen and D. A. Wood, "Adaptive cache compression for high-performance processors," in Proceedings of International Symposium on Computer Architecture, pp. 212--223, IEEE, 2004.
[19]
S. Sardashti and D. A. Wood, "Decoupled compressed cache: exploiting spatial locality for energy-optimized compressed caching," in Proceedings of IEEE/ACM International Symposium on Microarchitecture, pp. 62--73, ACM, 2013.
[20]
E. G. Hallnor and S. K. Reinhardt, "A unified compressed memory hierarchy," in High-Performance Computer Architecture, 2005. HPCA-11. 11th International Symposium on, pp. 201--212, IEEE, 2005.
[21]
G. Pekhimenko, V. Seshadri, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry, "Base-delta-immediate compression: practical data compression for on-chip caches," in Proceedings of international conference on Parallel architectures and compilation techniques, pp. 377--388, ACM, 2012.
[22]
S. Kim, J. Lee, J. Kim, and S. Hong, "Residue cache: a low-energy low-area L2 cache architecture via compression and partial hits," in Proceedings of IEEE/ACM International Symposium on Microarchitecture, pp. 420--429, ACM, 2011.
[23]
X. Chen, L. Yang, R. P. Dick, L. Shang, and H. Lekatsas, "C-Pack: A high-performance microprocessor cache compression algorithm," Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, vol. 18, no. 8, pp. 1196--1208, 2010.
[24]
A. Arelakis and P. Stenstrom, "SC2: A statistical compression cache scheme," in Computer Architecture (ISCA), 2014 ACM/IEEE 41st International Symposium on, pp. 145--156, IEEE, 2014.
[25]
A. Arelakis and P. Stenstrom, "A case for a value-aware cache," Computer Architecture Letters, vol. 13, no. 1, pp. 1--4, 2014.
[26]
I. Pavlov, "LZMA SDK," www.7-zip.org/sdk.html, 2007.
[27]
L. P. Deutsch, "GZIP file format specification version 4.3," 1996.
[28]
E. G. Hallnor and S. K. Reinhardt, "A fully associative software-managed cache design," in Proceedings of International Symposium on Computer Architecture, pp. 107--116, IEEE, 2000.
[29]
A. Agarwal and S. Pudar, "Column-associative Caches: A Technique For Reducing The Miss Rate Of Direct-mapped Caches," in Proceedings of International Symposium on Computer Architecture, pp. 179--190, IEEE, 1993.
[30]
A. Agarwal, J. Hennessy, and M. Horowitz, "Cache performance of operating system and multiprogramming workloads," ACM Transactions on Computer Systems (TOCS), vol. 6, no. 4, pp. 393--431, 1988.
[31]
M. Burtscher and P. Ratanaworabhan, "FPC: A high-speed compressor for double-precision floating-point data," Computers, IEEE Transactions on, vol. 58, no. 1, pp. 18--31, 2009.
[32]
L. P. Deutsch, "DEFLATE compressed data format specification version 1.3," 1996.
[33]
AHA, "AHA Data Compression," http://www.aha.com/data-compression/.
[34]
Indra, "Indra Products," http://www.indranetworks.com/products.html.
[35]
Y. Fu and D. Wentzlaff, "PriME: A parallel and distributed simulator for thousand-core chips," in Performance Analysis of Systems and Software (ISPASS), 2014 IEEE International Symposium on, pp. 116--125, IEEE, 2014.
[36]
T. E. Carlson, W. Heirman, and L. Eeckhout, "Sniper: Exploring the Level of Abstraction for Scalable and Accurate Parallel Multi-Core Simulations," in International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 52:1--52:12, Nov. 2011.
[37]
H. Patil and T. E. Carlson, "Pinballs: Portable and Shareable User-level Checkpoints for Reproducible Analysis and Simulation," in Proceedings of the Workshop on Reproducible Research Methodologies (REPRODUCE), 2014.
[38]
A. R. Alameldeen and D. A. Wood, "Frequent pattern compression: A significance-based compression scheme for L2 caches," Dept. Comp. Scie., Univ. Wisconsin-Madison, Tech. Rep, vol. 1500, 2004.
[39]
A. Jaleel, "Memory characterization of workloads using instrumentation-driven simulation,"Web Copy: http://www.glue.umd.edu/ajaleel/workload, 2010.
[40]
M. Mckeown, J. Balkind, and D. Wentzlaff, "Execution Drafting: Energy Efficiency Through Computation Deduplication," in Proceedings of IEEE/ACM International Symposium on Microarchitecture, pp. 432--444, IEEE Computer Society, 2014.
[41]
O. Villa, D. R. Johnson, M. O'Connor, et al., "Scaling the power wall: a path to exascale," in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 830--841, IEEE Press, 2014.
[42]
S. Sardashti, A. Seznec, D. Wood, et al., "Skewed Compressed Caches," in Proceedings of IEEE/ACM International Symposium on Microarchitecture, pp. 331--342, IEEE, 2014.
[43]
R. B. Tremaine et al., "IBM memory expansion technology (MXT)," IBM Journal of Research and Development, vol. 45, no. 2, pp. 271--285, 2001.
[44]
M. Thuresson, L. Spracklen, and P. Stenstrom, "Memory-link compression schemes: A value locality perspective," Computers, IEEE Transactions on, vol. 57, no. 7, pp. 916--927, 2008.
[45]
V. Sathish, M. J. Schulte, and N. S. Kim, "Lossless and lossy memory I/O link compression for improving performance of GPGPU workloads," in Proceedings of international conference on Parallel architectures and compilation techniques, pp. 325--334, ACM, 2012.

Cited By

View all
  • (2024)Enterprise-Class Cache Compression Design2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00080(996-1011)Online publication date: 2-Mar-2024
  • (2023)Dalorex: A Data-Local Program Execution and Architecture for Memory-bound Applications2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071089(718-730)Online publication date: Feb-2023
  • (2022)Exploiting Inter-block Entropy to Enhance the Compressibility of Blocks with Diverse Data2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00084(1100-1114)Online publication date: Apr-2022
  • Show More Cited By

Index Terms

  1. MORC: a manycore-oriented compressed cache

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MICRO-48: Proceedings of the 48th International Symposium on Microarchitecture
    December 2015
    787 pages
    ISBN:9781450340342
    DOI:10.1145/2830772
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 05 December 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. caches
    2. compression
    3. manycore

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    MICRO-48
    Sponsor:

    Acceptance Rates

    MICRO-48 Paper Acceptance Rate 61 of 283 submissions, 22%;
    Overall Acceptance Rate 484 of 2,242 submissions, 22%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)96
    • Downloads (Last 6 weeks)12
    Reflects downloads up to 21 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Enterprise-Class Cache Compression Design2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00080(996-1011)Online publication date: 2-Mar-2024
    • (2023)Dalorex: A Data-Local Program Execution and Architecture for Memory-bound Applications2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071089(718-730)Online publication date: Feb-2023
    • (2022)Exploiting Inter-block Entropy to Enhance the Compressibility of Blocks with Diverse Data2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00084(1100-1114)Online publication date: Apr-2022
    • (2021)Byte-Select CompressionACM Transactions on Architecture and Code Optimization10.1145/346220918:4(1-27)Online publication date: 3-Sep-2021
    • (2020)Safecracker: Leaking Secrets through Compressed CachesProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3373376.3378453(1125-1140)Online publication date: 9-Mar-2020
    • (2019)Compress Objects, Not Cache LinesProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3297858.3304006(229-242)Online publication date: 4-Apr-2019
    • (2019)Comparative Study on Data Compression Techniques in Cache to Promote Performance2019 IEEE International Conference on Intelligent Techniques in Control, Optimization and Signal Processing (INCOS)10.1109/INCOS45849.2019.8951324(1-6)Online publication date: Apr-2019
    • (2018)CABLEProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00033(312-325)Online publication date: 20-Oct-2018
    • (2017)MBZipACM Transactions on Architecture and Code Optimization10.1145/315103314:4(1-29)Online publication date: 5-Dec-2017
    • (2017)DICEACM SIGARCH Computer Architecture News10.1145/3140659.308024345:2(627-638)Online publication date: 24-Jun-2017
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media