[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Towards a Truly Integrated Vector Processing Unit for Memory-bound Applications Based on a Cost-competitive Computational SRAM Design Solution

Published: 28 April 2022 Publication History

Abstract

This article presents Computational SRAM (C-SRAM) solution combining In- and Near-Memory Computing approaches. It allows performing arithmetic, logic, and complex memory operations inside or next to the memory without transferring data over the system bus, leading to significant energy reduction. Operations are performed on large vectors of data occupying the entire physical row of C-SRAM array, leading to high performance gains. We introduce the C-SRAM solution in this article as an integrated vector processing unit to be used by a scalar processor as an energy-efficient and high performing co-processor. We detail the C-SRAM system design on different levels: (i) circuit design and silicon proof of concept, (ii) system interface and instruction set architecture, and (iii) high-level software programming and simulation. Experimental results on two complete memory-bound applications, AES and MobileNetV2, show that the C-SRAM implementation achieves up to 70× timing speedup and 37× energy reduction compared to scalar architecture, and up to 17× timing speedup and 5× energy reduction compared to SIMD architecture.

References

[4]
[n.d.]. LLVM Language Reference Manual. Retrieved from www.llvm.org/docs/LangRef.html.
[5]
[n.d.]. OpenCL. Retrieved from https://www.khronos.org/opencl/.
[7]
[n.d.]. RISC-V “V” Vector Extension. Retrieved from https://riscv.github.io/documents/riscv-v-spec/.
[8]
2018. UPMEM. Retrieved Dec. 2021 from www.upmem.com/.
[9]
Shaizeen Aga et al. 2017. Compute caches. In IEEE International Symposium on High-Performance Computer Architecture. 481–492.
[10]
Kaya Can Akyel et al. 2016. DRC2: Dynamically reconfigurable computing circuit based on memory architecture. In IEEE International Conference on Rebooting Computing. 1–8.
[11]
Armin Alaghi and John P. Hayes. 2013. Survey of stochastic computing. ACM Trans. Embed. Comput. Syst.ems 12, 2s (May 2013), 1–19.
[12]
Vaclav E. Benes. 1964. Permutation groups, complexes, and rearrangeable connecting networks. Bell Syst. Technic. J. 43, 4 (1964), 1619–1640.
[13]
Wei-Hao Chen et al. 2017. A 16Mb dual-mode ReRAM macro with sub-14ns computing-in-memory and memory functions enabled by self-write termination scheme. In IEEE International Electron Devices Meeting. 28–2.
[14]
Joan Daemen and Vincent Rijmen. 2013. The Design of Rijndael: AES—The Advanced Encryption Standard. Springer Science & Business Media.
[15]
Charles Eckert et al. 2018. Neural cache: Bit-serial in-cache acceleration of deep neural networks. In IEEE International Symposium on Computer Architecture. 383–396.
[16]
Valentin Eglof et al. 2021. Storage class memory with computing row buffer: A design space exploration. In IEEE Design, Automation and Test in Europe Conference.
[17]
Reouven Elbaz et al. 2005. Hardware engines for bus encryption: A survey of existing techniques. In IEEE Design, Automation and Test in Europe Conference. 40–45.
[18]
Pierre-Emmanuel Gaillardon et al. 2016. The programmable logic-in-memory (PLiM) computer. In IEEE Design, Automation and Test in Europe Conference. 427–432.
[19]
John L. Hennessy and David Patterson. 2018. Computer Architecture: A Quantitative Approach, 6th Edition. Elsevier.
[20]
Mark Horowitz. 2014. 1.1 Computing’s energy problem (and what we can do about it). In IEEE International Solid-State Circuits Conference.
[21]
Kamil Khan et al. 2020. A survey of resource management for processing-in-memory and near-memory processing architectures. J. Low Power Electron. Applic. 10, 4 (2020).
[22]
Maha Kooli. 2016. Analysing and Supporting the Reliability Decision-making Process in Computing Systems with a Reliability Evaluation Framework. Thesis. Université Montpellier.
[23]
Maha Kooli et al. 2017. Software Platform Dedicated for In-Memory Computing Circuit Evaluation. In IEEE/ACM RSP-ESWEEK.
[24]
Maha Kooli et al. 2018. Smart instruction codes for in-memory computing architectures compatible with standard SRAM interfaces. In IEEE Design, Automation and Test in Europe Conference. 1634–1639.
[25]
Shahar Kvatinsky et al. 2013. Memristor-based material implication (IMPLY) logic: Design principles and methodologies. IEEE Trans. VLSI Syst. 22, 10 (2013), 2054–2066.
[26]
Shahar Kvatinsky et al. 2014. MAGIC–Memristor-aided logic. IEEE Trans. Circ. Syst. II: Expr. Briefs 61, 11 (2014), 895–899.
[27]
Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for lifelong program analysis & transformation. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization. IEEE Computer Society, 75.
[28]
Shuangchen Li et al. 2016. Pinatubo: A processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories. In ACM Design Automation Conference. 173.
[29]
Onur Mutlu et al. 2019. Enabling practical processing in and near memory for data-intensive computing. In 56th Annual Design Automation Conference. 1–4.
[30]
Jean-Philippe Noel et al. 2020. Computational SRAM design automation using pushed-rule bitcells for energy-efficient vector processing. In IEEE Design, Automation and Test in Europe Conference. 1187–1192.
[31]
J.-P. Noel, M. Pezzin, R. Gauchi, J.-F. Christmann, M. Kooli, H.-P. Charles, L. Ciampolini, M. Diallo, F. Lepin, B. Blampey et al. 2020. A 35.6 TOPS/W/mm2 3-stage pipelined computational SRAM with adjustable form factor for highly data-centric applications. IEEE ISSCL Lett. 3 (2020), 286–289.
[32]
P. Prinz, T. Crawford, J. L. Hennessy, and D. A. Patterson. 2018. Computer Architecture: A Quantitative Approach (6th Edition).
[33]
Gauchi Roman et al. 2020. Reconfigurable tiles of computing-in-memory SRAM architecture for scalable vectorization. In ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED). 121–126.
[34]
Mark Sandler et al. 2018. MobileNetV2: Inverted residuals and linear bottlenecks. In IEEE Conference on Computer Vision and Pattern Recognition. 4510–4520.
[35]
Abu Sebastian et al. 2020. Memory devices and applications for in-memory computing. Nat. Nanotechnol. 15, 7 (2020), 529–544.
[36]
Vivek Seshadri et al. 2017. Ambit: In-memory accelerator for bulk bitwise operations using commodity DRAM technology. In IEEE/ACM International Symposium on Microarchitecture. 273–287.
[37]
Patrick Siegl et al. 2016. Data-centric computing frontiers: A survey on processing-in-memory. In 2nd International Symposium on Memory Systems. Association for Computing Machinery, New York, NY, 295–308.
[38]
Rodney S. Tucker. 2010. Green optical communications–Part II: Energy limitations in networks. IEEE J. Select. Topics Quantum Electron. 17, 2 (2010), 261–274.
[39]
Jingcheng Wang et al. 2019. 14.2 A compute SRAM with bit-serial integer/floating-point operations for programmable in-memory vector acceleration. In IEEE ISSCC Conference. 224–226.
[40]
Panni Wang et al. 2018. Three-dimensional NAND flash for vector–matrix multiplication. IEEE Trans. VLSI Syst. 27, 4 (2018), 988–991.
[41]
Lei Xie et al. 2017. Scouting logic: A novel memristor-based logic design for resistive computing. (2017), 176–181. DOI:
[42]
Cheng-Xin Xue et al. 2019. Embedded 1-Mb ReRAM-based computing-in-memory macro with multibit input and weight for CNN-based AI edge processors. IEEE JSSCC J. 55, 1 (2019), 203–215.
[43]
Keiji Yanai, Ryosuke Tanno, and Koichi Okamoto. 2016. Efficient mobile implementation of a CNN-based object recognition system. In 24th ACM International Conference on Multimedia. 362–366.
[44]
Yiqun Zhang et al. 2018. Recryptor: A reconfigurable cryptographic Cortex-M0 processor with in-memory and near-memory computing for IoT security. IEEE JSSCC J. 53, 4 (2018), 995–1005.

Cited By

View all
  • (2024)A Hardware Instruction Generation Mechanism for Energy-Efficient Computational Memories2024 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS58744.2024.10557870(1-5)Online publication date: 19-May-2024
  • (2024)Emerging Technologies for Memory-Centric ComputingDesign and Applications of Emerging Computer Systems10.1007/978-3-031-42478-6_1(3-29)Online publication date: 14-Jan-2024
  • (2023)FeFET based Logic-in-Memory design methodologies, tools and open challenges2023 IFIP/IEEE 31st International Conference on Very Large Scale Integration (VLSI-SoC)10.1109/VLSI-SoC57769.2023.10321901(1-6)Online publication date: 16-Oct-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Journal on Emerging Technologies in Computing Systems
ACM Journal on Emerging Technologies in Computing Systems  Volume 18, Issue 2
April 2022
411 pages
ISSN:1550-4832
EISSN:1550-4840
DOI:10.1145/3508462
  • Editor:
  • Ramesh Karri
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 28 April 2022
Accepted: 01 September 2021
Received: 01 September 2020
Published in JETC Volume 18, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Computational SRAM
  2. in- and near-memory computing
  3. simulation
  4. AES
  5. MobileNetV2

Qualifiers

  • Research-article
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)236
  • Downloads (Last 6 weeks)24
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A Hardware Instruction Generation Mechanism for Energy-Efficient Computational Memories2024 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS58744.2024.10557870(1-5)Online publication date: 19-May-2024
  • (2024)Emerging Technologies for Memory-Centric ComputingDesign and Applications of Emerging Computer Systems10.1007/978-3-031-42478-6_1(3-29)Online publication date: 14-Jan-2024
  • (2023)FeFET based Logic-in-Memory design methodologies, tools and open challenges2023 IFIP/IEEE 31st International Conference on Very Large Scale Integration (VLSI-SoC)10.1109/VLSI-SoC57769.2023.10321901(1-6)Online publication date: 16-Oct-2023
  • (2023)Compute-In-Place Serial FeRAM: Enhancing Performance, Efficiency and Adaptability in Critical Embedded Systems2023 IFIP/IEEE 31st International Conference on Very Large Scale Integration (VLSI-SoC)10.1109/VLSI-SoC57769.2023.10321864(1-6)Online publication date: 16-Oct-2023
  • (2023)Supporting a Virtual Vector Instruction Set on a Commercial Compute-in-SRAM AcceleratorIEEE Computer Architecture Letters10.1109/LCA.2023.3341389(1-4)Online publication date: 2023

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media