More Web Proxy on the site http://driver.im/

research-article

Open access

MBZip: Multiblock Data Compression

Authors:

Raghavendra Kanakagiri,

Biswabandan Panda,

Madhu MutyamAuthors Info & Claims

ACM Transactions on Architecture and Code Optimization (TACO), Volume 14, Issue 4

Article No.: 42, Pages 1 - 29

https://doi.org/10.1145/3151033

Published: 05 December 2017 Publication History

Abstract

Compression techniques at the last-level cache and the DRAM play an important role in improving system performance by increasing their effective capacities. A compressed block in DRAM also reduces the transfer time over the memory bus to the caches, reducing the latency of a LLC cache miss. Usually, compression is achieved by exploiting data patterns present within a block. But applications can exhibit data locality that spread across multiple consecutive data blocks. We observe that there is significant opportunity available for compressing multiple consecutive data blocks into one single block, both at the LLC and DRAM. Our studies using 21 SPEC CPU applications show that, at the LLC, around 25% (on average) of the cache blocks can be compressed into one single cache block when grouped together in groups of 2 to 8 blocks. In DRAM, more than 30% of the columns residing in a single DRAM page can be compressed into one DRAM column, when grouped together in groups of 2 to 6. Motivated by these observations, we propose a mechanism, namely, MBZip, that compresses multiple data blocks into one single block (called a zipped block), both at the LLC and DRAM. At the cache, MBZip includes a simple tag structure to index into these zipped cache blocks and the indexing does not incur any redirectional delay. At the DRAM, MBZip does not need any changes to the address computation logic and works seamlessly with the conventional/existing logic. MBZip is a synergistic mechanism that coordinates these zipped blocks at the LLC and DRAM. Further, we also explore silent writes at the DRAM and show that certain writes need not access the memory when blocks are zipped. MBZip improves the system performance by 21.9%, with a maximum of 90.3% on a 4-core system.

Supplementary Material

TACO1404-42 (taco1404-42.pdf)

Slide deck associated with this paper

Download
1.23 MB

References

[1]

Intel 64 and ia32 architecture software developer’s manuals. 2017. Retrieved October 24, 2017 from http://www.intel.com/products/processor/manuals/.

[2]

B. Abali, H. Franke, D. E. Poff, R. A. Saccone, Jr., C. O. Schulz, L. M. Herger, and T. B. Smith. 2001. Memory expansion technology (MXT): Software support and performance. IBM Journal of Research and Development 45, 2 (2001), 287--302.

Digital Library

[3]

A. R. Alameldeen and D. A.Wood. 2004. Frequent pattern compression: A significance-based compression scheme for l2 caches. In Technical Report 1500, University of Wisconsin-Madison, Computer Sciences Department.

[4]

Alaa R. Alameldeen and David A. Wood. 2004. Adaptive cache compression for high-performance processors. In Proceedings of the 31st Annual International Symposium on Computer Architecture (ISCA’04). IEEE Computer Society, Washington, DC, USA, 212-. http://dl.acm.org/citation.cfm?id=998680.1006719.

Digital Library

[5]

Angelos Arelakis, Fredrik Dahlgren, and Per Stenstrom. 2015. HyComp: A hybrid cache compression method for selection of data-type-specific compression methods. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO’15). ACM, New York, NY, USA, 38–49.

Digital Library

[6]

Angelos Arelakis and Per Stenstrom. 2014. SC2: A statistical compression cache scheme. In Proceedings of the 41st Annual International Symposium on Computer Architecture (ISCA’14). IEEE Press, Piscataway, NJ, USA, 145–156. http://dl.acm.org/citation.cfm?id=2665671.2665696.

Digital Library

[7]

A. Shafiee, M. Taassori, R. Balasubramonian, and A. Davis. 2014. MemZip: Exploring unconventional benefits from memory compression. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA’14).

[8]

Amro Awad, Pratyusa Manadhata, Stuart Haber, Yan Solihin, and William Horne. 2016. Silent shredder: Zero-cost shredding for secure non-volatile main memory controllers. In Proceedings of the 21st International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’16). ACM, New York, NY, USA, 263--276.

Digital Library

[9]

Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib, Nilay Vaish, Mark D. Hill, and David A. Wood. 2011. The Gem5 simulator. SIGARCH Computer Architecture News 39, 2, 1--7.

Digital Library

[10]

Xi Chen, Lei Yang, Robert P. Dick, Li Shang, and Haris Lekatsas. 2010. C-pack: A high-performance microprocessor cache compression algorithm. IEEE Transactions on Very Large Scale Integration Systems 18, 8, 1196--1208.

Digital Library

[11]

David Cheriton, Amin Firoozshahian, Alex Solomatnikov, John P. Stevenson, and Omid Azizi. 2012. HICAMP: Architectural support for efficient concurrency-safe shared structured data access. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’12). ACM, New York, NY, USA, 287--300.

Digital Library

[12]

A. Deb, P. Faraboschi, A. Shafiee, N. Muralimanohar, R. Balasubramonian, and R. Schreiber. 2016. Enabling technologies for memory compression: Metadata, mapping, and prediction. In IEEE 34th International Conference on Computer Design (ICCD’16). 17--24.

[13]

Magnus Ekman and Per Stenstrom. 2005. A robust main-memory compression scheme. In Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA’05). IEEE Computer Society, Washington, DC, USA, 74--85.

Digital Library

[14]

Stijn Eyerman and Lieven Eeckhout. 2008. System-level performance metrics for multiprogram workloads. IEEE Micro 28, 3, 42--53.

Digital Library

[15]

Jayesh Gaur, Alaa R. Alameldeen, and Sreenivas Subramoney. 2016. Base-victim compression: An opportunistic cache compression architecture. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA’16). IEEE Press, Piscataway, NJ, USA, 317--328.

Digital Library

[16]

Erik G. Hallnor and Steven K. Reinhardt. 2005. A unified compressed memory hierarchy. In Proceedings of the 11th International Symposium on High-Performance Computer Architecture (HPCA'05). IEEE Computer Society, Washington, DC, USA, 201--212.

Digital Library

[17]

John L. Henning. 2000. SPEC CPU2000: Measuring CPU performance in the new millennium. Computer 33, 7, 28--35.

Digital Library

[18]

John L. Henning. 2006. SPEC CPU2006 benchmark descriptions. SIGARCH Computer Architecture News 34, 4, 1--17.

Digital Library

[19]

Jungrae Kim, Michael Sullivan, Esha Choukse, and Mattan Erez. 2016. Bit-plane compression: Transforming data for better compression in many-core architectures. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA’16). IEEE Press, Piscataway, NJ, USA, 329--340.

Digital Library

[20]

Jungrae Kim, Michael Sullivan, Seong-Lyong Gong, and Mattan Erez. 2015. Frugal ECC: Efficient and versatile memory error protection through fine-grained compression. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’15). ACM, New York, NY, USA, Article 12, 12 pages.

Digital Library

[21]

Mostafa Kishani, Amirali Baniasadi, and Hossein Pedram. 2011. Using silent writes in low-power traffic-aware ECC. In Proceedings of the 21st International Conference on Integrated Circuit and System Design: Power and Timing Modeling, Optimization, and Simulation (PATMOS’11). Springer, Berlin,180--192. http://dl.acm.org/citation.cfm?id=2045364.2045383.

Digital Library

[22]

S. Lal, J. Lucas, and B. Juurlink. 2017. E2MC: Entropy encoding based memory compression for GPUs. In IEEE International Parallel and Distributed Processing Symposium (IPDPS’17). 1119--1128.

[23]

S. Lee, K. Kim, G. Koo, H. Jeon, M. Annavaram, and W. W. Ro. 2017. Improving energy efficiency of GPUs through data compression and compressed execution. IEEE Transactions on Computers 66, 5, 834--847.

Digital Library

[24]

Kevin M. Lepak and Mikko H. Lipasti. 2000. On the value locality of store instructions. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA’00). ACM, New York, NY, USA, 182--191.

Digital Library

[25]

Kevin M. Lepak and Mikko H. Lipasti. 2000. Silent stores for free. In Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture (MICRO’00). ACM, New York, NY, USA, 22--31.

Digital Library

[26]

L. Liu, P. Chi, S. Li, Y. Cheng, and Y. Xie. 2017. Building energy-efficient multi-level cell STT-RAM caches with data compression. In 22nd Asia and South Pacific Design Automation Conference (ASP-DAC’17). 751--756.

[27]

Wenjie Liu, Ping Huang, Kun Tang, Ke Zhou, and Xubin He. 2016. CAR: A compression-aware refresh approach to improve memory performance and energy efficiency. In Proceedings of the 2016 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Science (SIGMETRICS’16). ACM, New York, NY, USA, 373--374.

Digital Library

[28]

K. Luo, J. Gummaraju, and M. Franklin. 2001. Balancing throughput and fairness in SMT processors. In Proceedings of the International Symposium on Performance Analysis of Systems and Software (ISPASS’01). 164--171.

[29]

R Manikantan, Kaushik Rajan, and R. Govindarajan. 2012. Probabilistic shared cache management (PriSM). In Proceedings of the 39th Annual International Symposium on Computer Architecture (ISCA’12). IEEE Computer Society, Washington, DC, USA, 428--439. http://dl.acm.org/citation.cfm?id=2337159.2337208.

Digital Library

[30]

Tri M. Nguyen and David Wentzlaff. 2015. MORC: A manycore-oriented compressed cache. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO’15). ACM, New York, NY, USA, 76--88.

Digital Library

[31]

Poovaiah M. Palangappa and Kartik Mohanram. 2017. CompEx++: Compression-expansion coding for energy, latency, and lifetime improvements in MLC/TLC NVMs. ACM Transactions on Architecture and Code Optimization 14, 1, Article 10, 30 pages.

Digital Library

[32]

David J. Palframan, Nam Sung Kim, and Mikko H. Lipasti. 2015. COP: To compress and protect main memory. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA’15). ACM, New York, NY, USA, 682--693.

Digital Library

[33]

B. Panda and A. Seznec. 2016. Dictionary sharing: An efficient cache compression scheme for compressed caches. In 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’16). 1--12.

[34]

Jaehyun Park, Seungcheol Baek, Hyung Gyu Lee, Chrysostomos Nicopoulos, Vinson Young, Junghee Lee, and Jongman Kim. 2017. HoPE: Hot-cacheline prediction for dynamic early decompression in compressed LLCs. ACM Transactions on Design Automation of Electronic Systems 22, 3, Article 40, 25 pages.

Digital Library

[35]

G. Pekhimenko, E. Bolotin, N. Vijaykumar, O. Mutlu, T. C. Mowry, and S. W. Keckler. 2016. A case for toggle-aware compression for GPU systems. In IEEE International Symposium on High Performance Computer Architecture (HPCA’16). 188--200.

[36]

G. Pekhimenko, T. Huberty, R. Cai, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry. 2015. Exploiting compressed block size as an indicator of future reuse. In IEEE 21st International Symposium on High Performance Computer Architecture (HPCA’15). 51--63.

[37]

Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi Xin, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, and Todd C. Mowry. 2013. Linearly compressed pages: A low-complexity, low-latency main memory compression framework. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’13). ACM, New York, NY, USA, 172--184.

Digital Library

[38]

Gennady Pekhimenko, Vivek Seshadri, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, and Todd C. Mowry. 2012. Base-delta-immediate compression: Practical data compression for on-chip caches. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques (PACT’12). ACM, New York, NY, USA, 377--388.

Digital Library

[39]

Daniel Sanchez and Christos Kozyrakis. 2011. Vantage: Scalable and efficient fine-grain cache partitioning. In Proceedings of the 38th Annual International Symposium on Computer Architecture (ISCA’11). ACM, New York, NY, USA, 57--68.

Digital Library

[40]

Somayeh Sardashti, André Seznec, and David A. Wood. 2014. Skewed compressed caches. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’14). IEEE Computer Society, Washington, DC, USA, 331--342.

Digital Library

[41]

Somayeh Sardashti, André Seznec, and David A. Wood. 2016. Yet another compressed cache: A low cost yet effective compressed cache. In ACM Transactions on Architecture and Code Optimization. ACM.

Digital Library

[42]

Somayeh Sardashti and David A. Wood. 2013. Decoupled compressed cache: Exploiting spatial locality for energy-optimized compressed caching. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’13). ACM, New York, NY, USA, 62--73.

Digital Library

[43]

Vijay Sathish, Michael J. Schulte, and Nam Sung Kim. 2012. Lossless and lossy memory I/O link compression for improving performance of GPGPU workloads. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques (PACT’12). ACM, New York, NY, USA, 325--334.

Digital Library

[44]

Allan Snavely and Dean M. Tullsen. 2000. Symbiotic jobscheduling for a simultaneous multithreaded processor. In Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’00). ACM, New York, NY, USA, 234--244.

Digital Library

[45]

Yingying Tian, Samira M. Khan, Daniel A. Jiménez, and Gabriel H. Loh. 2014. Last-level cache deduplication. In Proceedings of the 28th ACM International Conference on Supercomputing (ICS’14). ACM, New York, NY, USA, 53--62.

Digital Library

[46]

Nandita Vijaykumar, Gennady Pekhimenko, Adwait Jog, Abhishek Bhowmick, Rachata Ausavarungnirun, Chita Das, Mahmut Kandemir, Todd C. Mowry, and Onur Mutlu. 2015. A case for core-assisted bottleneck acceleration in GPUs: Enabling flexible data compression with assist warps. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA’15). ACM, New York, NY, USA, 41--53.

Digital Library

[47]

Jun Yang, Youtao Zhang, and Rajiv Gupta. 2000. Frequent value compression in data caches. In Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture (MICRO’00). ACM, New York, NY, USA, 258--265.

Digital Library

[48]

Vinson Young, Prashant J. Nair, and Moinuddin K. Qureshi. 2017. DICE: Compressing DRAM caches for bandwidth and capacity. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA’17). ACM, New York, NY, USA, 627--638.

Digital Library

[49]

Jishen Zhao, Sheng Li, Jichuan Chang, John L. Byrne, Laura L. Ramirez, Kevin Lim, Yuan Xie, and Paolo Faraboschi. 2015. Buri: Scaling big-memory computing with hardware-based memory expansion. ACM Transactions on Architecture and Code Optimization 12, 3, Article 31, 24 pages.

Digital Library

Cited By

Eldstål-Ahrens AArelakis ASourdis IKloeckner AMoreira J(2022)FlatPackProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569653(96-108)Online publication date: 8-Oct-2022
https://dl.acm.org/doi/10.1145/3559009.3569653
Eldstål-Ahrens AArelakis ASourdis I(2022)L2C: Combining Lossy and Lossless Compression on Memory and I/OACM Transactions on Embedded Computing Systems10.1145/348164121:1(1-27)Online publication date: 14-Jan-2022
https://dl.acm.org/doi/10.1145/3481641
Deb DM.K RJose J(2022)FlitZip: Effective Packet Compression for NoC in MultiProcessor System-on-ChipIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.309031533:1(117-128)Online publication date: 1-Jan-2022
https://doi.org/10.1109/TPDS.2021.3090315
Show More Cited By

Index Terms

MBZip: Multiblock Data Compression
1. Computer systems organization
  1. Dependable and fault-tolerant systems and networks
    1. Processors and memory architectures
2. Information systems
  1. Data management systems
    1. Data structures
      1. Data layout
        Data compression

Recommendations

HAP: Hybrid-Memory-Aware Partition in Shared Last-Level Cache

Data-center servers benefit from large-capacity memory systems to run multiple processes simultaneously. Hybrid DRAM-NVM memory is attractive for increasing memory capacity by exploiting the scalability of Non-Volatile Memory (NVM). However, current LLC ...
Meeting midway: improving CMP performance with memory-side prefetching
PACT '13: Proceedings of the 22nd international conference on Parallel architectures and compilation techniques

Both on-chip resource contention and off-chip latencies have a significant impact on memory requests in largescale chip multiprocessors. We propose a memory-side prefetcher, which brings data on-chip from DRAM, but does not proactively further push this ...
The evicted-address filter: a unified mechanism to address both cache pollution and thrashing
PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques

Off-chip main memory has long been a bottleneck for system performance. With increasing memory pressure due to multiple on-chip cores, effective cache utilization is important. In a system with limited cache space, we would ideally like to prevent 1) ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization

ACM Transactions on Architecture and Code Optimization Volume 14, Issue 4

December 2017

600 pages

ISSN:1544-3566

EISSN:1544-3973

DOI:10.1145/3154814

Editor:
Koen De Bosschere
Ghent University

Issue’s Table of Contents

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 December 2017

Accepted: 01 October 2017

Revised: 01 August 2017

Received: 01 September 2016

Published in TACO Volume 14, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
810
Total Downloads

Downloads (Last 12 months)159
Downloads (Last 6 weeks)13

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Eldstål-Ahrens AArelakis ASourdis IKloeckner AMoreira J(2022)FlatPackProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569653(96-108)Online publication date: 8-Oct-2022
https://dl.acm.org/doi/10.1145/3559009.3569653
Eldstål-Ahrens AArelakis ASourdis I(2022)L2C: Combining Lossy and Lossless Compression on Memory and I/OACM Transactions on Embedded Computing Systems10.1145/348164121:1(1-27)Online publication date: 14-Jan-2022
https://dl.acm.org/doi/10.1145/3481641
Deb DM.K RJose J(2022)FlitZip: Effective Packet Compression for NoC in MultiProcessor System-on-ChipIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.309031533:1(117-128)Online publication date: 1-Jan-2022
https://doi.org/10.1109/TPDS.2021.3090315
Yoon SJun SCho YLee KJang HHan T(2020)Optimized Lossless Embedded Compression for Mobile Multimedia ApplicationsElectronics10.3390/electronics90508689:5(868)Online publication date: 23-May-2020
https://doi.org/10.3390/electronics9050868
Eldstål-Ahrens ASourdis I(2020)MemSZACM Transactions on Architecture and Code Optimization10.1145/342466817:4(1-25)Online publication date: 10-Nov-2020
https://dl.acm.org/doi/10.1145/3424668
Seiler LLin DYuksel C(2020)Compacted CPU/GPU Data Compression via Modified Virtual Address TranslationProceedings of the ACM on Computer Graphics and Interactive Techniques10.1145/34061773:2(1-18)Online publication date: 26-Aug-2020
https://dl.acm.org/doi/10.1145/3406177
Tsai PSanchez AFletcher CSanchez DLarus JCeze LStrauss K(2020)Safecracker: Leaking Secrets through Compressed CachesProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3373376.3378453(1125-1140)Online publication date: 9-Mar-2020
https://dl.acm.org/doi/10.1145/3373376.3378453
Nguyen TFuchs AWentzlaff DOskin MInoue K(2018)CABLEProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00033(312-325)Online publication date: 20-Oct-2018
https://dl.acm.org/doi/10.1109/MICRO.2018.00033

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Figures

Tables

Media

View Issue’s Table of Contents