[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Open access

MBZip: Multiblock Data Compression

Published: 05 December 2017 Publication History

Abstract

Compression techniques at the last-level cache and the DRAM play an important role in improving system performance by increasing their effective capacities. A compressed block in DRAM also reduces the transfer time over the memory bus to the caches, reducing the latency of a LLC cache miss. Usually, compression is achieved by exploiting data patterns present within a block. But applications can exhibit data locality that spread across multiple consecutive data blocks. We observe that there is significant opportunity available for compressing multiple consecutive data blocks into one single block, both at the LLC and DRAM. Our studies using 21 SPEC CPU applications show that, at the LLC, around 25% (on average) of the cache blocks can be compressed into one single cache block when grouped together in groups of 2 to 8 blocks. In DRAM, more than 30% of the columns residing in a single DRAM page can be compressed into one DRAM column, when grouped together in groups of 2 to 6. Motivated by these observations, we propose a mechanism, namely, MBZip, that compresses multiple data blocks into one single block (called a zipped block), both at the LLC and DRAM. At the cache, MBZip includes a simple tag structure to index into these zipped cache blocks and the indexing does not incur any redirectional delay. At the DRAM, MBZip does not need any changes to the address computation logic and works seamlessly with the conventional/existing logic. MBZip is a synergistic mechanism that coordinates these zipped blocks at the LLC and DRAM. Further, we also explore silent writes at the DRAM and show that certain writes need not access the memory when blocks are zipped. MBZip improves the system performance by 21.9%, with a maximum of 90.3% on a 4-core system.

Supplementary Material

TACO1404-42 (taco1404-42.pdf)
Slide deck associated with this paper

References

[1]
Intel 64 and ia32 architecture software developer’s manuals. 2017. Retrieved October 24, 2017 from http://www.intel.com/products/processor/manuals/.
[2]
B. Abali, H. Franke, D. E. Poff, R. A. Saccone, Jr., C. O. Schulz, L. M. Herger, and T. B. Smith. 2001. Memory expansion technology (MXT): Software support and performance. IBM Journal of Research and Development 45, 2 (2001), 287--302.
[3]
A. R. Alameldeen and D. A.Wood. 2004. Frequent pattern compression: A significance-based compression scheme for l2 caches. In Technical Report 1500, University of Wisconsin-Madison, Computer Sciences Department.
[4]
Alaa R. Alameldeen and David A. Wood. 2004. Adaptive cache compression for high-performance processors. In Proceedings of the 31st Annual International Symposium on Computer Architecture (ISCA’04). IEEE Computer Society, Washington, DC, USA, 212-. http://dl.acm.org/citation.cfm?id=998680.1006719.
[5]
Angelos Arelakis, Fredrik Dahlgren, and Per Stenstrom. 2015. HyComp: A hybrid cache compression method for selection of data-type-specific compression methods. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO’15). ACM, New York, NY, USA, 38–49.
[6]
Angelos Arelakis and Per Stenstrom. 2014. SC2: A statistical compression cache scheme. In Proceedings of the 41st Annual International Symposium on Computer Architecture (ISCA’14). IEEE Press, Piscataway, NJ, USA, 145–156. http://dl.acm.org/citation.cfm?id=2665671.2665696.
[7]
A. Shafiee, M. Taassori, R. Balasubramonian, and A. Davis. 2014. MemZip: Exploring unconventional benefits from memory compression. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA’14).
[8]
Amro Awad, Pratyusa Manadhata, Stuart Haber, Yan Solihin, and William Horne. 2016. Silent shredder: Zero-cost shredding for secure non-volatile main memory controllers. In Proceedings of the 21st International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’16). ACM, New York, NY, USA, 263--276.
[9]
Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib, Nilay Vaish, Mark D. Hill, and David A. Wood. 2011. The Gem5 simulator. SIGARCH Computer Architecture News 39, 2, 1--7.
[10]
Xi Chen, Lei Yang, Robert P. Dick, Li Shang, and Haris Lekatsas. 2010. C-pack: A high-performance microprocessor cache compression algorithm. IEEE Transactions on Very Large Scale Integration Systems 18, 8, 1196--1208.
[11]
David Cheriton, Amin Firoozshahian, Alex Solomatnikov, John P. Stevenson, and Omid Azizi. 2012. HICAMP: Architectural support for efficient concurrency-safe shared structured data access. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’12). ACM, New York, NY, USA, 287--300.
[12]
A. Deb, P. Faraboschi, A. Shafiee, N. Muralimanohar, R. Balasubramonian, and R. Schreiber. 2016. Enabling technologies for memory compression: Metadata, mapping, and prediction. In IEEE 34th International Conference on Computer Design (ICCD’16). 17--24.
[13]
Magnus Ekman and Per Stenstrom. 2005. A robust main-memory compression scheme. In Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA’05). IEEE Computer Society, Washington, DC, USA, 74--85.
[14]
Stijn Eyerman and Lieven Eeckhout. 2008. System-level performance metrics for multiprogram workloads. IEEE Micro 28, 3, 42--53.
[15]
Jayesh Gaur, Alaa R. Alameldeen, and Sreenivas Subramoney. 2016. Base-victim compression: An opportunistic cache compression architecture. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA’16). IEEE Press, Piscataway, NJ, USA, 317--328.
[16]
Erik G. Hallnor and Steven K. Reinhardt. 2005. A unified compressed memory hierarchy. In Proceedings of the 11th International Symposium on High-Performance Computer Architecture (HPCA'05). IEEE Computer Society, Washington, DC, USA, 201--212.
[17]
John L. Henning. 2000. SPEC CPU2000: Measuring CPU performance in the new millennium. Computer 33, 7, 28--35.
[18]
John L. Henning. 2006. SPEC CPU2006 benchmark descriptions. SIGARCH Computer Architecture News 34, 4, 1--17.
[19]
Jungrae Kim, Michael Sullivan, Esha Choukse, and Mattan Erez. 2016. Bit-plane compression: Transforming data for better compression in many-core architectures. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA’16). IEEE Press, Piscataway, NJ, USA, 329--340.
[20]
Jungrae Kim, Michael Sullivan, Seong-Lyong Gong, and Mattan Erez. 2015. Frugal ECC: Efficient and versatile memory error protection through fine-grained compression. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’15). ACM, New York, NY, USA, Article 12, 12 pages.
[21]
Mostafa Kishani, Amirali Baniasadi, and Hossein Pedram. 2011. Using silent writes in low-power traffic-aware ECC. In Proceedings of the 21st International Conference on Integrated Circuit and System Design: Power and Timing Modeling, Optimization, and Simulation (PATMOS’11). Springer, Berlin,180--192. http://dl.acm.org/citation.cfm?id=2045364.2045383.
[22]
S. Lal, J. Lucas, and B. Juurlink. 2017. E2MC: Entropy encoding based memory compression for GPUs. In IEEE International Parallel and Distributed Processing Symposium (IPDPS’17). 1119--1128.
[23]
S. Lee, K. Kim, G. Koo, H. Jeon, M. Annavaram, and W. W. Ro. 2017. Improving energy efficiency of GPUs through data compression and compressed execution. IEEE Transactions on Computers 66, 5, 834--847.
[24]
Kevin M. Lepak and Mikko H. Lipasti. 2000. On the value locality of store instructions. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA’00). ACM, New York, NY, USA, 182--191.
[25]
Kevin M. Lepak and Mikko H. Lipasti. 2000. Silent stores for free. In Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture (MICRO’00). ACM, New York, NY, USA, 22--31.
[26]
L. Liu, P. Chi, S. Li, Y. Cheng, and Y. Xie. 2017. Building energy-efficient multi-level cell STT-RAM caches with data compression. In 22nd Asia and South Pacific Design Automation Conference (ASP-DAC’17). 751--756.
[27]
Wenjie Liu, Ping Huang, Kun Tang, Ke Zhou, and Xubin He. 2016. CAR: A compression-aware refresh approach to improve memory performance and energy efficiency. In Proceedings of the 2016 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Science (SIGMETRICS’16). ACM, New York, NY, USA, 373--374.
[28]
K. Luo, J. Gummaraju, and M. Franklin. 2001. Balancing throughput and fairness in SMT processors. In Proceedings of the International Symposium on Performance Analysis of Systems and Software (ISPASS’01). 164--171.
[29]
R Manikantan, Kaushik Rajan, and R. Govindarajan. 2012. Probabilistic shared cache management (PriSM). In Proceedings of the 39th Annual International Symposium on Computer Architecture (ISCA’12). IEEE Computer Society, Washington, DC, USA, 428--439. http://dl.acm.org/citation.cfm?id=2337159.2337208.
[30]
Tri M. Nguyen and David Wentzlaff. 2015. MORC: A manycore-oriented compressed cache. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO’15). ACM, New York, NY, USA, 76--88.
[31]
Poovaiah M. Palangappa and Kartik Mohanram. 2017. CompEx++: Compression-expansion coding for energy, latency, and lifetime improvements in MLC/TLC NVMs. ACM Transactions on Architecture and Code Optimization 14, 1, Article 10, 30 pages.
[32]
David J. Palframan, Nam Sung Kim, and Mikko H. Lipasti. 2015. COP: To compress and protect main memory. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA’15). ACM, New York, NY, USA, 682--693.
[33]
B. Panda and A. Seznec. 2016. Dictionary sharing: An efficient cache compression scheme for compressed caches. In 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’16). 1--12.
[34]
Jaehyun Park, Seungcheol Baek, Hyung Gyu Lee, Chrysostomos Nicopoulos, Vinson Young, Junghee Lee, and Jongman Kim. 2017. HoPE: Hot-cacheline prediction for dynamic early decompression in compressed LLCs. ACM Transactions on Design Automation of Electronic Systems 22, 3, Article 40, 25 pages.
[35]
G. Pekhimenko, E. Bolotin, N. Vijaykumar, O. Mutlu, T. C. Mowry, and S. W. Keckler. 2016. A case for toggle-aware compression for GPU systems. In IEEE International Symposium on High Performance Computer Architecture (HPCA’16). 188--200.
[36]
G. Pekhimenko, T. Huberty, R. Cai, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry. 2015. Exploiting compressed block size as an indicator of future reuse. In IEEE 21st International Symposium on High Performance Computer Architecture (HPCA’15). 51--63.
[37]
Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi Xin, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, and Todd C. Mowry. 2013. Linearly compressed pages: A low-complexity, low-latency main memory compression framework. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’13). ACM, New York, NY, USA, 172--184.
[38]
Gennady Pekhimenko, Vivek Seshadri, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, and Todd C. Mowry. 2012. Base-delta-immediate compression: Practical data compression for on-chip caches. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques (PACT’12). ACM, New York, NY, USA, 377--388.
[39]
Daniel Sanchez and Christos Kozyrakis. 2011. Vantage: Scalable and efficient fine-grain cache partitioning. In Proceedings of the 38th Annual International Symposium on Computer Architecture (ISCA’11). ACM, New York, NY, USA, 57--68.
[40]
Somayeh Sardashti, André Seznec, and David A. Wood. 2014. Skewed compressed caches. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’14). IEEE Computer Society, Washington, DC, USA, 331--342.
[41]
Somayeh Sardashti, André Seznec, and David A. Wood. 2016. Yet another compressed cache: A low cost yet effective compressed cache. In ACM Transactions on Architecture and Code Optimization. ACM.
[42]
Somayeh Sardashti and David A. Wood. 2013. Decoupled compressed cache: Exploiting spatial locality for energy-optimized compressed caching. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’13). ACM, New York, NY, USA, 62--73.
[43]
Vijay Sathish, Michael J. Schulte, and Nam Sung Kim. 2012. Lossless and lossy memory I/O link compression for improving performance of GPGPU workloads. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques (PACT’12). ACM, New York, NY, USA, 325--334.
[44]
Allan Snavely and Dean M. Tullsen. 2000. Symbiotic jobscheduling for a simultaneous multithreaded processor. In Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’00). ACM, New York, NY, USA, 234--244.
[45]
Yingying Tian, Samira M. Khan, Daniel A. Jiménez, and Gabriel H. Loh. 2014. Last-level cache deduplication. In Proceedings of the 28th ACM International Conference on Supercomputing (ICS’14). ACM, New York, NY, USA, 53--62.
[46]
Nandita Vijaykumar, Gennady Pekhimenko, Adwait Jog, Abhishek Bhowmick, Rachata Ausavarungnirun, Chita Das, Mahmut Kandemir, Todd C. Mowry, and Onur Mutlu. 2015. A case for core-assisted bottleneck acceleration in GPUs: Enabling flexible data compression with assist warps. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA’15). ACM, New York, NY, USA, 41--53.
[47]
Jun Yang, Youtao Zhang, and Rajiv Gupta. 2000. Frequent value compression in data caches. In Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture (MICRO’00). ACM, New York, NY, USA, 258--265.
[48]
Vinson Young, Prashant J. Nair, and Moinuddin K. Qureshi. 2017. DICE: Compressing DRAM caches for bandwidth and capacity. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA’17). ACM, New York, NY, USA, 627--638.
[49]
Jishen Zhao, Sheng Li, Jichuan Chang, John L. Byrne, Laura L. Ramirez, Kevin Lim, Yuan Xie, and Paolo Faraboschi. 2015. Buri: Scaling big-memory computing with hardware-based memory expansion. ACM Transactions on Architecture and Code Optimization 12, 3, Article 31, 24 pages.

Cited By

View all
  • (2022)FlatPackProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569653(96-108)Online publication date: 8-Oct-2022
  • (2022)L2C: Combining Lossy and Lossless Compression on Memory and I/OACM Transactions on Embedded Computing Systems10.1145/348164121:1(1-27)Online publication date: 14-Jan-2022
  • (2022)FlitZip: Effective Packet Compression for NoC in MultiProcessor System-on-ChipIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.309031533:1(117-128)Online publication date: 1-Jan-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization
ACM Transactions on Architecture and Code Optimization  Volume 14, Issue 4
December 2017
600 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/3154814
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 December 2017
Accepted: 01 October 2017
Revised: 01 August 2017
Received: 01 September 2016
Published in TACO Volume 14, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. CPU cache
  2. memory
  3. performance

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)159
  • Downloads (Last 6 weeks)13
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2022)FlatPackProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569653(96-108)Online publication date: 8-Oct-2022
  • (2022)L2C: Combining Lossy and Lossless Compression on Memory and I/OACM Transactions on Embedded Computing Systems10.1145/348164121:1(1-27)Online publication date: 14-Jan-2022
  • (2022)FlitZip: Effective Packet Compression for NoC in MultiProcessor System-on-ChipIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.309031533:1(117-128)Online publication date: 1-Jan-2022
  • (2020)Optimized Lossless Embedded Compression for Mobile Multimedia ApplicationsElectronics10.3390/electronics90508689:5(868)Online publication date: 23-May-2020
  • (2020)MemSZACM Transactions on Architecture and Code Optimization10.1145/342466817:4(1-25)Online publication date: 10-Nov-2020
  • (2020)Compacted CPU/GPU Data Compression via Modified Virtual Address TranslationProceedings of the ACM on Computer Graphics and Interactive Techniques10.1145/34061773:2(1-18)Online publication date: 26-Aug-2020
  • (2020)Safecracker: Leaking Secrets through Compressed CachesProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3373376.3378453(1125-1140)Online publication date: 9-Mar-2020
  • (2018)CABLEProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00033(312-325)Online publication date: 20-Oct-2018

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media