[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Open access

Byte-Select Compression

Published: 03 September 2021 Publication History

Abstract

Cache-block compression is a highly effective technique for both reducing accesses to lower levels in the memory hierarchy (cache compression) and minimizing data transfers (link compression). While many effective cache-block compression algorithms have been proposed, the design of these algorithms is largely ad hoc and manual and relies on human recognition of patterns. In this article, we take an entirely different approach. We introduce a class of “byte-select” compression algorithms, as well as an automated methodology for generating compression algorithms in this class. We argue that, based on upper bounds within the class, the study of this class of byte-select algorithms has potential to yield algorithms with better performance than existing cache-block compression algorithms. The upper bound we establish on the compression ratio is 2X that of any existing algorithm. We then offer a generalized representation of a subset of byte-select compression algorithms and search through the resulting space guided by a set of training data traces. Using this automated process, we find efficient and effective algorithms for various hardware applications. We find that the resulting algorithms exploit novel patterns that can inform future algorithm designs. The generated byte-select algorithms are evaluated against a separate set of traces and evaluations show that Byte-Select has a 23% higher compression ratio on average. While no previous algorithm performs best for all our data sets which include CPU and GPU applications, our generated algorithms do. Using an automated hardware generator for these algorithms, we show that their decompression and compression latency is one and two cycles respectively, much lower than any existing algorithm with a competitive compression ratio.

References

[1]
A. Arelakis and P. Stenstrom. 2014. SC2: A statistical compression cache scheme. In ACM SIGARCH Computer Architecture News 42 (2014), 145–156, Accessed: Jun. 30, 2017. [Online]. Available: http://dl.acm.org/citation.cfm?id=2665696.
[2]
A. R. Alameldeen and D. A. Wood. 2004. Adaptive cache compression for high-performance processors. In Computer Architecture, 2004. Proceedings. 31st Annual International Symposium on (2004), 212–223, Accessed: Jul. 02, 2017. [Online]. Available: http://ieeexplore.ieee.org/abstract/document/1310776/.
[3]
Xi Chen, Lei Yang, R. P. Dick, Li Shang, and H. Lekatsas. 2010. C-Pack: A high-performance microprocessor cache compression algorithm. IEEE Trans. Very Large Scale Integr. VLSI Syst. 18, 8 (2010), 1196–1208, Aug. 2010.
[4]
G. Pekhimenko, V. Seshadri, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry. 2012. Base-delta-immediate compression: Practical data compression for on-chip caches. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques (2012), 377–388.
[5]
S. Sardashti and D. A. Wood. 2013. Decoupled compressed cache: Exploiting spatial locality for energy-optimized compressed caching. In 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (2013) 62–73.
[6]
B. Panda and A. Seznec. 2016. Dictionary sharing: An efficient cache compression scheme for compressed caches. In Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on (2016), 1–12, Accessed: Jun. 30, 2017. [Online]. Available: http://ieeexplore.ieee.org/abstract/document/7783704/.
[7]
A. R. Alameldeen and D. A. Wood. 2004. Frequent pattern compression: A significance-based compression scheme for L2 caches. Dept Comp Scie Univ Wis.-Madison Tech Rep 1500, Accessed: Jul. 02, 2017. [Online]. Available: http://ftp.cs.wisc.edu/pub/techreports/2004/TR1500.pdf.
[8]
A. Arelakis, F. Dahlgren, and P. Stenstrom. 2015. HyComp: A hybrid cache compression method for selection of data-type-specific compression methods (date?), 38–49.
[9]
T. M. Nguyen and D. Wentzlaff. 2015. MORC: A manycore-oriented compressed cache. In 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (2015), 76–88.
[10]
G. Pekhimenko. 2016. Practical data compression for modern memory hierarchies. Carnegie Mellon University (2016)
[11]
S. Sardashti, A. Seznec, and D. A. Wood. 2014. Skewed compressed caches. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (date), 331–342.
[12]
S. Sardashti, A. Seznec, and D. A. Wood. 2016. Yet another compressed cache: a low-cost yet effective compressed cache. ACM Trans Arch. Code Optim 13, 3 (2016), 27:1–27:25, Sep. 2016.
[13]
J. Dusser, T. Piquet, and A. Seznec. 2009. Zero-content augmented caches. In Proceedings of the 23rd International Conference on Supercomputing 46–55, Accessed: Jul. 02, 2017. [Online]. Available: http://dl.acm.org/citation.cfm?id=1542288.
[14]
V. Young, S. Kariyappa, and M. K. Qureshi. 2019. Enabling Transparent Memory-Compression for Commodity Memory Systems. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). 570--581.
[15]
B. Abali et al. 2001. Memory expansion technology (MXT): Software support and performance. IBM J. Res. Dev 2001.
[16]
A. Shafiee, M. Taassori, R. Balasubramonian, and A. K. Davis. 2014. MemZip: Exploring unconventional benefits from memory compression. In High Performance Computer Architecture (HPCA), 2014 IEEE 20th International Symposium on 638–649, Accessed: May 22, 2016. [Online]. Available: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6835972.
[17]
S. Kim, S. Lee, T. Kim, and J. Huh. 2017. Transparent dual memory architecture. Presented at the PACT’17, Accessed: Oct. 07, 2017. [Online]. Available: http://calab.kaist.ac.kr:8080/∼jhuh/papers/kim_pact17.pdf.
[18]
M. Rhu, M. O'Connor, N. Chatterjee, J. Pool, and S. W. Keckler. 2017. Compressing DMA engine: leveraging activation sparsity for training deep neural networks. ArXiv170501626 Cs, May 2017, [Online]. Available: http://arxiv.org/abs/1705.01626.
[19]
L. Yang, H. Lekatsas, and R. P. Dick. 2006. High-performance operating system controlled memory compression. In Proceedings of the 43rd annual Design Automation Conference 701–704, Accessed: Jul. 02, 2017. [Online]. Available: http://dl.acm.org/citation.cfm?id=1147086.
[20]
V. Sathish, M. J. Schulte, and N. S. Kim. 2012. Lossless and lossy memory I/O link compression for improving performance of GPGPU workloads. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques 325–334, Accessed: Jun. 30, 2017. [Online]. Available: http://dl.acm.org/citation.cfm?id=2370864.
[21]
J. Kim, M. Sullivan, E. Choukse, and M. Erez. 2016. Bit-plane compression: transforming data for better compression in many-core architectures. In Proceedings of the 43rd International Symposium on Computer Architecture, Piscataway, NJ, USA, 2016, 329–340.
[22]
A. Ghasemazar, P. Nair, and M. Lis. 2020. Thesaurus: Efficient cache compression via dynamic clustering. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, New York, NY, USA, 2020, 527–540.
[23]
Lattice (order). 2019. Wikipedia. Oct. 25, 2019, Accessed: Nov. 23, 2019. [Online]. Available: https://en.wikipedia.org/w/index.php?title=Lattice_(order)&oldid=922960068.
[24]
S. Russel and P. Norvig. Artificial Intelligence: A Modern Approach, Third Edition.
[25]
J. L. Henning. 2006. SPEC CPU2006 Benchmark descriptions. SIGARCH Comput Arch. News 34, 4 (2006), 1–17.
[26]
I. Karlin. 2012. LULESH programming model and performance ports overview. LLNL-TR-608824, 1059462, Dec. 2012.
[27]
S. Che et al. 2009. Rodinia: A benchmark suite for heterogeneous computing. In 2009 IEEE International Symposium on Workload Characterization (IISWC), Austin, TX, USA, Oct. 2009, 44–54.
[28]
C. E. Shannon. A mathematical theory of communication. Bell Labs Tech. J. 27, 3, 55.
[29]
P.-A. Tsai and D. Sanchez. 2019. Compress Objects, Not Cache Lines: An Object-Based Compressed Memory Hierarchy. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. New York, NY, 229--242.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization
ACM Transactions on Architecture and Code Optimization  Volume 18, Issue 4
December 2021
497 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/3476575
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 September 2021
Accepted: 01 April 2021
Revised: 01 February 2021
Received: 01 September 2020
Published in TACO Volume 18, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Byte-select
  2. cache compression
  3. memory compression

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 1,832
    Total Downloads
  • Downloads (Last 12 months)656
  • Downloads (Last 6 weeks)57
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media