More Web Proxy on the site http://driver.im/

research-article

Input-Splitting of Large Neural Networks for Power-Efficient Accelerator with Resistive Crossbar Memory Array

Authors:

Jae-Joon KimAuthors Info & Claims

ISLPED '18: Proceedings of the International Symposium on Low Power Electronics and Design

Article No.: 41, Pages 1 - 6

https://doi.org/10.1145/3218603.3218605

Published: 23 July 2018 Publication History

Abstract

Resistive Crossbar memory Arrays (RCA) have been gaining interest as a promising platform to implement Convolutional Neural Networks (CNN). One of the major challenges in RCA-based design is that the number of rows in an RCA is often smaller than the number of input neurons in a layer. Previous works used high-resolution Analog-to-Digital Converters (ADCs) to compute the partial weighted sum in each array and merged partial sums from multiple arrays outside the RCAs. However, such approach suffers from significant power consumption due to the need for high-resolution ADCs. In this paper, we propose a methodology to more efficiently construct a large CNN with multiple RCAs. By splitting the input feature map and retraining the CNN with proper initialization, we demonstrate that any CNN model can be represented with multiple arrays without using intermediate partial sums. The experimental results show that the ADC power of the proposed design is 32x smaller and the total chip power of the proposed design is 3x smaller than those of the baseline design.

References

[1]

Tensorflow Alphabet Inc. 2017. Deep MNIST for Experts. Retrieved November 20, 2017 from https://www.tensorflow.org/get_started/mnist/pros

[2]

Marc Borremans et al. 2001. A low power, 10-bit CMOS D/A converter for high speed applications. In IEEE Custom Integrated Circuits Conference. IEEE, 157--160.

[3]

Vanessa H-C Chen et al. 2013. An 8.5 mW 5GS/s 6b flash ADC with dynamic offset calibration in 32nm CMOS SOI. In Symposium on VLSI Circuits (VLSIC). IEEE, C264-C265.

[4]

Yu-Hsin Chen et al. 2017. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE Journal of Solid-State Circuits 52, 1 (2017), 127--138.

[5]

Ping Chi et al. 2016. Prime: A novel processing-in-memory architecture for neural network computation in reram-based main memory. In Proceedings of the 43rd International Symposium on Computer Architecture. IEEE Press, 27--39.

Digital Library

[6]

Kaiming He et al. 2015. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision. 1026--1034.

Digital Library

[7]

Miao Hu et al. 2016. Dot-Product Engine for Neuromorphic Computing: Programming 1T1M Crossbar to Accelerate Matrix-Vector Multiplication. DAC-53 (2016), p. 19.

[8]

Itay Hubara et al. 2016. Quantized neural networks: Training neural networks with low precision weights and activations. arXiv preprint arXiv:1609.07061 (2016).

[9]

Alex Krizhevsky et al. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105.

Digital Library

[10]

Lukas Kull et al. 2013. A 3.1 mW 8b 1.2 GS/s single-channel asynchronous SAR ADC with alternate comparators for enhanced speed in 32 nm digital SOI CMOS. IEEE Journal of Solid-State Circuits 48, 12 (2013), 3049--3058.

[11]

Boxun Li et al. 2015. Merging the interface: Power, area and accuracy co-optimization for rram crossbar-based mixed-signal computing system. In Design Automation Conference (DAC), 2015 52nd ACM/EDAC/IEEE. IEEE, 1--6.

Digital Library

[12]

Mehdi Saberi et al. 2011. Analysis of power consumption and linearity in capacitive digital-to-analog converters used in successive approximation ADCs. IEEE Transactions on Circuits and Systems I: Regular Papers 58, 8 (2011), 1736--1748.

[13]

Ali Shafiee et al. 2016. ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars. ISCA 2016 (2016).

Digital Library

[14]

Tianqi Tang et al. 2017. Binary convolutional neural network on RRAM. In Design Automation Conference (ASP-DAC), 2017 22nd Asia and South Pacific. IEEE, 782--787.

[15]

Bob Verbruggen et al. 2009. A 2.2 mW 1.75 GS/s 5 bit folding flash ADC in 90 nm digital CMOS. IEEE Journal of Solid-State Circuits 44, 3 (2009), 874--882.

[16]

Qian Wang et al. 2016. Neuromorphic processors with memristive synapses: Synaptic interface and architectural exploration. ACM Journal on Emerging Technologies in Computing Systems (JETC) 12, 4 (2016), 35.

Digital Library

[17]

Lixue Xia et al. 2016. Switched by input: Power efficient structure for RRAM-based convolutional neural network. In Proceedings of the 53rd Annual Design Automation Conference. ACM, 125.

Digital Library

[18]

Sergey Zagoruyko. 2015. 92.45% on CIFAR-10 in Torch. Retrieved November 20, 2017 from http://torch.ch/blog/2015/07/30/cifar.html

[19]

Shuchang Zhou et al. 2016. DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160 (2016).

Cited By

Apolinario MKosta ASaxena URoy K(2024)Hardware/Software Co-Design With ADC-Less In-Memory Computing Hardware for Spiking Neural NetworksIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2023.331612112:1(35-47)Online publication date: Jan-2024
https://doi.org/10.1109/TETC.2023.3316121
Seo JSeok MCho S(2024)A 44.2-TOPS/W CNN Processor With Variation-Tolerant Analog Datapath and Variation Compensating CircuitIEEE Journal of Solid-State Circuits10.1109/JSSC.2023.332164359:5(1603-1611)Online publication date: May-2024
https://doi.org/10.1109/JSSC.2023.3321643
Zhu HGu JWang HJiang ZZhang ZTang RFeng CHan SChen RPan D(2024)Lightening-Transformer: A Dynamically-Operated Optically-Interconnected Photonic Transformer Accelerator2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00059(686-703)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00059
Show More Cited By

Input-Splitting of Large Neural Networks for Power-Efficient Accelerator with Resistive Crossbar Memory Array
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches

Recommendations

Wear leveling for crossbar resistive memory
DAC '18: Proceedings of the 55th Annual Design Automation Conference

Resistive Memory (ReRAM) is an emerging non-volatile memory technology that has many advantages over conventional DRAM. ReRAM crossbar has the smallest 4F² planar cell size and thus is widely adopted for constructing dense memory with large capacity. ...
Low-Voltage Low-Power IntegratedPhase-Shifter as Resistive SensorTransducer

In this paper, we present a CMOS low-voltage low-power phase shifter topology, to be used as an integrated resistive sensor interface for portable applications. The circuit furnishes an output square wave whose time delay and shift are linear with the value ...
Experimental antenna array calibration with artificial neural networks

It is well known that to perform accurate Direction of Arrivals (DOA) estimation using algorithms like MUSIC (MUltiple SIgnals Classification), antenna array data must be calibrated to match the theoretical model upon which DOA algorithms are based. ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ISLPED '18: Proceedings of the International Symposium on Low Power Electronics and Design

July 2018

327 pages

ISBN:9781450357043

DOI:10.1145/3218603

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 July 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ISLPED '18

Sponsor:

SIGDA

ISLPED '18: International Symposium on Low Power Electronics and Design

July 23 - 25, 2018

WA, Seattle, USA

Acceptance Rates

Overall Acceptance Rate 398 of 1,159 submissions, 34%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

19
Total Citations
View Citations
863
Total Downloads

Downloads (Last 12 months)89
Downloads (Last 6 weeks)10

Reflects downloads up to 13 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Apolinario MKosta ASaxena URoy K(2024)Hardware/Software Co-Design With ADC-Less In-Memory Computing Hardware for Spiking Neural NetworksIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2023.331612112:1(35-47)Online publication date: Jan-2024
https://doi.org/10.1109/TETC.2023.3316121
Seo JSeok MCho S(2024)A 44.2-TOPS/W CNN Processor With Variation-Tolerant Analog Datapath and Variation Compensating CircuitIEEE Journal of Solid-State Circuits10.1109/JSSC.2023.332164359:5(1603-1611)Online publication date: May-2024
https://doi.org/10.1109/JSSC.2023.3321643
Zhu HGu JWang HJiang ZZhang ZTang RFeng CHan SChen RPan D(2024)Lightening-Transformer: A Dynamically-Operated Optically-Interconnected Photonic Transformer Accelerator2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00059(686-703)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00059
Zhao CFang JJiang JXue XZeng X(2023)ARBiS: A Hardware-Efficient SRAM CIM CNN Accelerator With Cyclic-Shift Weight Duplication and Parasitic-Capacitance Charge Sharing for AI Edge ApplicationIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2022.321553570:1(364-377)Online publication date: Jan-2023
https://doi.org/10.1109/TCSI.2022.3215535
Zheng YYang WChen YHan D(2023)An Energy-Efficient Inference Engine for a Configurable ReRAM-Based Neural Network AcceleratorIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.318446442:3(740-753)Online publication date: Mar-2023
https://doi.org/10.1109/TCAD.2022.3184464
Saxena UChakraborty IRoy K(2022)Towards ADC-Less Compute-In-Memory Accelerators for Energy Efficient Deep Learning2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE54114.2022.9774573(624-627)Online publication date: 14-Mar-2022
https://doi.org/10.23919/DATE54114.2022.9774573
Ali MRoy SSaxena USharma TRaghunathan ARoy K(2022)Compute-in-Memory Technologies and Architectures for Deep Learning WorkloadsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2022.320358330:11(1615-1630)Online publication date: Nov-2022
https://doi.org/10.1109/TVLSI.2022.3203583
Shin TPark SKim SKim SSon KPark HLho DCho KPark GGong KJung SKim J(2021)Signal Integrity Modeling and Analysis of Large-Scale Memristor Crossbar Array in a High-Speed Neuromorphic System for Deep Neural NetworkIEEE Transactions on Components, Packaging and Manufacturing Technology10.1109/TCPMT.2021.309274011:7(1122-1136)Online publication date: Jul-2021
https://doi.org/10.1109/TCPMT.2021.3092740
Cherupally SRakin AYin SSeok MFan DSeo J(2021)Leveraging Noise and Aggressive Quantization of In-Memory Computing for Robust DNN Hardware Against Adversarial Input and Weight Attacks2021 58th ACM/IEEE Design Automation Conference (DAC)10.1109/DAC18074.2021.9586233(559-564)Online publication date: 5-Dec-2021
https://doi.org/10.1109/DAC18074.2021.9586233
Ahn DOh HKim HKim YKim J(2021)Maximizing Parallel Activation of Word-Lines in MRAM-Based Binary Neural Network AcceleratorsIEEE Access10.1109/ACCESS.2021.31210119(141961-141969)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3121011
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents