More Web Proxy on the site http://driver.im/

research-article

Towards Efficient SRAM-PIM Architecture Design by Exploiting Unstructured Bit-Level Sparsity

Authors:

Weisheng ZhaoAuthors Info & Claims

DAC '24: Proceedings of the 61st ACM/IEEE Design Automation Conference

Article No.: 209, Pages 1 - 6

https://doi.org/10.1145/3649329.3655948

Published: 07 November 2024 Publication History

Abstract

Bit-level sparsity in neural network models harbors immense untapped potential. Eliminating redundant calculations of randomly distributed zero-bits significantly boosts computational efficiency. Yet, traditional digital SRAM-PIM architecture, limited by rigid crossbar architecture, struggles to effectively exploit this unstructured sparsity. To address this challenge, we propose Dyadic Block PIM (DB-PIM), a groundbreaking algorithm-architecture co-design framework. First, we propose an algorithm coupled with a distinctive sparsity pattern, termed a dyadic block (DB), that preserves the random distribution of non-zero bits to maintain accuracy while restricting the number of these bits in each weight to improve regularity. Architecturally, we develop a custom PIM macro that includes dyadic block multiplication units (DBMUs) and Canonical Signed Digit (CSD)-based adder trees, specifically tailored for Multiply-Accumulate (MAC) operations. An input pre-processing unit (IPU) further refines performance and efficiency by capitalizing on block-wise input sparsity. Results show that our proposed co-design framework achieves a remarkable speedup of up to 7.69× and energy savings of 83.43%.

References

[1]

Alex Krizhevsky et al. Imagenet Classification with Deep Convolutional Neural Networks. In Proceedings of the NIPS, 2012.

[2]

Yu Zhang, William Chan, and Navdeep Jaitly. Very Deep Convolutional Networks for End-to-End Speech Recognition. In Proceedings of the ICASSP, 2017.

[3]

Ekim Yurtsever, Jacob Lambert, et al. A Survey of Autonomous Driving: Common Practices and Emerging Technologies. IEEE ACCESS, 2020.

[4]

Yu-Der Chih, Po-Hao Lee, et al. An 89TOPS/W and 16.3 TOPS/mm² All-Digital SRAM-based Full-Precision Compute-in Memory Macro in 22nm for Machine-Learning Edge Applications. In Proceedings of the ISSCC, 2021.

[5]

Cenlin Duan, Jianlei Yang, et al. DDC-PIM: Efficient Algorithm/Architecture Co-Design for Doubling Data Capacity of SRAM-based Processing-in-Memory. IEEE TCAD, 2024.

Digital Library

[6]

Tzu-Hsien Yang et al. Sparse ReRAM Engine: Joint Exploration of Activation and Weight Sparsity in Compressed Neural Networks. In Proceedings of the ISCA, 2019.

[7]

Fangxin Liu, Wenbo Zhao, et al. Bit-Transformer: Transforming Bit-Level Sparsity into Higher Preformance in ReRAM-based Accelerator. In Proceedings of the ICCAD, 2021.

Digital Library

[8]

Xueyan Wang, Jianlei Yang, et al. TCIM: Triangle Counting Acceleration with Processing-in-MRAM Architecture. In Proceedings of the DAC, 2020.

[9]

Xuhang Chen et al. Accelerating Graph-Connected Component Computation with Emerging Processing-in-Memory Architecture. IEEE TCAD, 2022.

[10]

Yinglin Zhao, Jianlei Yang, et al. NAND-SPIN-Based Processing-in-MRAM Architecture for Convolutional Neural Network Acceleration. SCIS, 2023.

[11]

Fengbin Tu, Yiqi Wang, et al. SDP: Co-Designing Algorithm, Dataflow, and Architecture for In-SRAM Sparse NN Acceleration. IEEE TCAD, 2022.

[12]

Jinshan Yue, Xiaoyu Feng, et al. A 2.75-to-75.9 TOPS/W Computing-in-Memory NN Processor Supporting Set-Associate Block-Wise Zero Skipping and Ping-Pong CIM with Simultaneous Computation and Weight Updating. In Proceedings of the ISSCC, 2021.

[13]

Shiwei Liu, Peizhe Li, et al. A 28nm 53.8 TOPS/W 8b Sparse Transformer Accelerator with In-Memory Butterfly Zero Skipper for Unstructured-Pruned NN and CIM-based Local-Attention-Reusable Engine. In Proceedings of the ISSCC, 2023.

[14]

Fengbin Tu, Zihan Wu, et al. MuITCIM: A 28nm 2.24 uJ/token Attention-Token-Bit Hybrid Sparse Digital CIM-based Accelerator for Multimodal Transformers. In Proceedings of the ISSCC, 2023.

[15]

Ruiqi Guo, Zhiheng Yue, et al. TT@ CIM: A Tensor-Train in-Memory-Computing Processor Using Bit-Level-Sparsity Optimization and Variable Precision Quantization. IEEE JSSC, 2022.

[16]

Sampatrao L Pinjare et al. Implementation of Artificial Neural Network Architecture for Image Compression Using CSD Multiplier. In Proceedings of the ERCICA, 2013.

[17]

Bonan Yan, Jeng-Long Hsu, et al. A 1.041-Mb/mm² 27.38-TOPS/W Signed-INT8 Dynamic-Logic-based ADC-Less SRAM Compute-in-Memory Macro in 28nm with Reconfigurable Bitwise Operation for AI and Embedded Applications. In Proceedings of the ISSCC, 2022.

[18]

Karen Simonyan and Andrew Zisserman. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv preprint arXiv:1409.1556, 2014.

[19]

Kaiming He, Xiangyu Zhang, et al. Deep Residual Learning for Image Recognition. In Proceedings of the CVPR, 2016.

[20]

Mark Sandler, Andrew Howard, et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the CVPR, 2018.

[21]

Mingxing Tan and Quoc Le. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the ICML, 2019.

Index Terms

Towards Efficient SRAM-PIM Architecture Design by Exploiting Unstructured Bit-Level Sparsity
1. Computer systems organization
  1. Architectures
    1. Other architectures
2. Hardware

Index terms have been assigned to the content through auto-classification.

Recommendations

A Layer-wised Mixed-Precision CIM Accelerator with Bit-level Sparsity-aware ADCs for NAS-Optimized CNNs
ASPDAC '25: Proceedings of the 30th Asia and South Pacific Design Automation Conference

Exploring multiple precisions as well as sparsities for a computingin-memory (CIM) based convolutional accelerators is challenging. To further improve energy efficiency with minimal accuracy loss, this paper develops a neural architecture search (NAS) ...
Shift-CIM: In-SRAM Alignment To Support General-Purpose Bit-level Sparsity Exploration in SRAM Multiplication
Multiplication plays a critical role in SRAM-based Computing-in-Memory (CIM) architectures. However, current SRAM-based CIMs face three major limitations. First, they do not fully exploit bit-level sparsity, resulting in unnecessary overhead in both ...
Distilling Bit-level Sparsity Parallelism for General Purpose Deep Learning Acceleration
MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture

Along with the rapid evolution of deep neural networks, the ever-increasing complexity imposes formidable computation intensity to the hardware accelerator. In this paper, we propose a novel computing philosophy called “bit interleaving” and the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

DAC '24: Proceedings of the 61st ACM/IEEE Design Automation Conference

June 2024

2159 pages

ISBN:9798400706011

DOI:10.1145/3649329

Chair:
Vivek De

Copyright © 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGDA: ACM Special Interest Group on Design Automation
IEEE-CEDA

In-Cooperation

SIGBED: ACM Special Interest Group on Embedded Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 November 2024

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

NFSC

Conference

DAC '24

Sponsor:

SIGDA

DAC '24: 61st ACM/IEEE Design Automation Conference

June 23 - 27, 2024

CA, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25

Sponsor:
sigda

62nd ACM/IEEE Design Automation Conference

June 22 - 26, 2025

San Francisco , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
148
Total Downloads

Downloads (Last 12 months)148
Downloads (Last 6 weeks)55

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten