More Web Proxy on the site http://driver.im/

research-article

Open access

Byte-Select Compression

Authors:

Mohammad Seyedzadeh,

Philip Bedoukian,

Bradford Beckmann,

David WoodAuthors Info & Claims

ACM Transactions on Architecture and Code Optimization (TACO), Volume 18, Issue 4

Article No.: 49, Pages 1 - 27

https://doi.org/10.1145/3462209

Published: 03 September 2021 Publication History

All formats PDF

Abstract

Cache-block compression is a highly effective technique for both reducing accesses to lower levels in the memory hierarchy (cache compression) and minimizing data transfers (link compression). While many effective cache-block compression algorithms have been proposed, the design of these algorithms is largely ad hoc and manual and relies on human recognition of patterns. In this article, we take an entirely different approach. We introduce a class of “byte-select” compression algorithms, as well as an automated methodology for generating compression algorithms in this class. We argue that, based on upper bounds within the class, the study of this class of byte-select algorithms has potential to yield algorithms with better performance than existing cache-block compression algorithms. The upper bound we establish on the compression ratio is 2X that of any existing algorithm. We then offer a generalized representation of a subset of byte-select compression algorithms and search through the resulting space guided by a set of training data traces. Using this automated process, we find efficient and effective algorithms for various hardware applications. We find that the resulting algorithms exploit novel patterns that can inform future algorithm designs. The generated byte-select algorithms are evaluated against a separate set of traces and evaluations show that Byte-Select has a 23% higher compression ratio on average. While no previous algorithm performs best for all our data sets which include CPU and GPU applications, our generated algorithms do. Using an automated hardware generator for these algorithms, we show that their decompression and compression latency is one and two cycles respectively, much lower than any existing algorithm with a competitive compression ratio.

References

[1]

A. Arelakis and P. Stenstrom. 2014. SC2: A statistical compression cache scheme. In ACM SIGARCH Computer Architecture News 42 (2014), 145–156, Accessed: Jun. 30, 2017. [Online]. Available: http://dl.acm.org/citation.cfm?id=2665696.

Digital Library

[2]

A. R. Alameldeen and D. A. Wood. 2004. Adaptive cache compression for high-performance processors. In Computer Architecture, 2004. Proceedings. 31st Annual International Symposium on (2004), 212–223, Accessed: Jul. 02, 2017. [Online]. Available: http://ieeexplore.ieee.org/abstract/document/1310776/.

[3]

Xi Chen, Lei Yang, R. P. Dick, Li Shang, and H. Lekatsas. 2010. C-Pack: A high-performance microprocessor cache compression algorithm. IEEE Trans. Very Large Scale Integr. VLSI Syst. 18, 8 (2010), 1196–1208, Aug. 2010.

Digital Library

[4]

G. Pekhimenko, V. Seshadri, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry. 2012. Base-delta-immediate compression: Practical data compression for on-chip caches. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques (2012), 377–388.

Digital Library

[5]

S. Sardashti and D. A. Wood. 2013. Decoupled compressed cache: Exploiting spatial locality for energy-optimized compressed caching. In 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (2013) 62–73.

[6]

B. Panda and A. Seznec. 2016. Dictionary sharing: An efficient cache compression scheme for compressed caches. In Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on (2016), 1–12, Accessed: Jun. 30, 2017. [Online]. Available: http://ieeexplore.ieee.org/abstract/document/7783704/.

[7]

A. R. Alameldeen and D. A. Wood. 2004. Frequent pattern compression: A significance-based compression scheme for L2 caches. Dept Comp Scie Univ Wis.-Madison Tech Rep 1500, Accessed: Jul. 02, 2017. [Online]. Available: http://ftp.cs.wisc.edu/pub/techreports/2004/TR1500.pdf.

[8]

A. Arelakis, F. Dahlgren, and P. Stenstrom. 2015. HyComp: A hybrid cache compression method for selection of data-type-specific compression methods (date?), 38–49.

Digital Library

[9]

T. M. Nguyen and D. Wentzlaff. 2015. MORC: A manycore-oriented compressed cache. In 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (2015), 76–88.

Digital Library

[10]

G. Pekhimenko. 2016. Practical data compression for modern memory hierarchies. Carnegie Mellon University (2016)

[11]

S. Sardashti, A. Seznec, and D. A. Wood. 2014. Skewed compressed caches. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (date), 331–342.

[12]

S. Sardashti, A. Seznec, and D. A. Wood. 2016. Yet another compressed cache: a low-cost yet effective compressed cache. ACM Trans Arch. Code Optim 13, 3 (2016), 27:1–27:25, Sep. 2016.

Digital Library

[13]

J. Dusser, T. Piquet, and A. Seznec. 2009. Zero-content augmented caches. In Proceedings of the 23rd International Conference on Supercomputing 46–55, Accessed: Jul. 02, 2017. [Online]. Available: http://dl.acm.org/citation.cfm?id=1542288.

[14]

V. Young, S. Kariyappa, and M. K. Qureshi. 2019. Enabling Transparent Memory-Compression for Commodity Memory Systems. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). 570--581.

[15]

B. Abali et al. 2001. Memory expansion technology (MXT): Software support and performance. IBM J. Res. Dev 2001.

[16]

A. Shafiee, M. Taassori, R. Balasubramonian, and A. K. Davis. 2014. MemZip: Exploring unconventional benefits from memory compression. In High Performance Computer Architecture (HPCA), 2014 IEEE 20th International Symposium on 638–649, Accessed: May 22, 2016. [Online]. Available: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6835972.

[17]

S. Kim, S. Lee, T. Kim, and J. Huh. 2017. Transparent dual memory architecture. Presented at the PACT’17, Accessed: Oct. 07, 2017. [Online]. Available: http://calab.kaist.ac.kr:8080/∼jhuh/papers/kim_pact17.pdf.

[18]

M. Rhu, M. O'Connor, N. Chatterjee, J. Pool, and S. W. Keckler. 2017. Compressing DMA engine: leveraging activation sparsity for training deep neural networks. ArXiv170501626 Cs, May 2017, [Online]. Available: http://arxiv.org/abs/1705.01626.

[19]

L. Yang, H. Lekatsas, and R. P. Dick. 2006. High-performance operating system controlled memory compression. In Proceedings of the 43rd annual Design Automation Conference 701–704, Accessed: Jul. 02, 2017. [Online]. Available: http://dl.acm.org/citation.cfm?id=1147086.

[20]

V. Sathish, M. J. Schulte, and N. S. Kim. 2012. Lossless and lossy memory I/O link compression for improving performance of GPGPU workloads. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques 325–334, Accessed: Jun. 30, 2017. [Online]. Available: http://dl.acm.org/citation.cfm?id=2370864.

[21]

J. Kim, M. Sullivan, E. Choukse, and M. Erez. 2016. Bit-plane compression: transforming data for better compression in many-core architectures. In Proceedings of the 43rd International Symposium on Computer Architecture, Piscataway, NJ, USA, 2016, 329–340.

Digital Library

[22]

A. Ghasemazar, P. Nair, and M. Lis. 2020. Thesaurus: Efficient cache compression via dynamic clustering. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, New York, NY, USA, 2020, 527–540.

Digital Library

[23]

Lattice (order). 2019. Wikipedia. Oct. 25, 2019, Accessed: Nov. 23, 2019. [Online]. Available: https://en.wikipedia.org/w/index.php?title=Lattice_(order)&oldid=922960068.

[24]

S. Russel and P. Norvig. Artificial Intelligence: A Modern Approach, Third Edition.

[25]

J. L. Henning. 2006. SPEC CPU2006 Benchmark descriptions. SIGARCH Comput Arch. News 34, 4 (2006), 1–17.

Digital Library

[26]

I. Karlin. 2012. LULESH programming model and performance ports overview. LLNL-TR-608824, 1059462, Dec. 2012.

[27]

S. Che et al. 2009. Rodinia: A benchmark suite for heterogeneous computing. In 2009 IEEE International Symposium on Workload Characterization (IISWC), Austin, TX, USA, Oct. 2009, 44–54.

Digital Library

[28]

C. E. Shannon. A mathematical theory of communication. Bell Labs Tech. J. 27, 3, 55.

[29]

P.-A. Tsai and D. Sanchez. 2019. Compress Objects, Not Cache Lines: An Object-Based Compressed Memory Hierarchy. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. New York, NY, 229--242.

Digital Library

Index Terms

Byte-Select Compression
1. Computer systems organization
  1. Architectures
2. Hardware
  1. Communication hardware, interfaces and storage

Recommendations

Base-delta-immediate compression: practical data compression for on-chip caches
PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques

Cache compression is a promising technique to increase on-chip cache capacity and to decrease on-chip and off-chip bandwidth usage. Unfortunately, directly applying well-known compression algorithms (usually implemented in software) leads to high ...
Base-victim compression: an opportunistic cache compression architecture
ISCA '16: Proceedings of the 43rd International Symposium on Computer Architecture

The memory wall has motivated many enhancements to cache management policies aimed at reducing misses. Cache compression has been proposed to increase effective cache capacity, which potentially reduces capacity and conflict misses. However, complexity ...
Base-victim compression: an opportunistic cache compression architecture
ISCA'16

The memory wall has motivated many enhancements to cache management policies aimed at reducing misses. Cache compression has been proposed to increase effective cache capacity, which potentially reduces capacity and conflict misses. However, complexity ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization

ACM Transactions on Architecture and Code Optimization Volume 18, Issue 4

December 2021

497 pages

ISSN:1544-3566

EISSN:1544-3973

DOI:10.1145/3476575

Editor:
David Kaeli
Northeastern University, USA

Issue’s Table of Contents

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 September 2021

Accepted: 01 April 2021

Revised: 01 February 2021

Received: 01 September 2020

Published in TACO Volume 18, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
1,832
Total Downloads

Downloads (Last 12 months)656
Downloads (Last 6 weeks)57

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents