More Web Proxy on the site http://driver.im/

research-article

DEF: Differential Encoding of Featuremaps for Low Power Convolutional Neural Network Accelerators

Authors:

Alexander Montgomerie-Corcoran,

Christos Savvas-BouganisAuthors Info & Claims

ASPDAC '21: Proceedings of the 26th Asia and South Pacific Design Automation Conference

Pages 703 - 708

https://doi.org/10.1145/3394885.3431576

Published: 29 January 2021 Publication History

Abstract

As the need for the deployment of Deep Learning applications on edge-based devices becomes ever increasingly prominent, power consumption starts to become a limiting factor on the performance that can be achieved by the computational platforms. A significant source of power consumption for these edge-based machine learning accelerators is off-chip memory transactions. In the case of Convolutional Neural Network (CNN) workloads, a predominant workload in deep learning applications, those memory transactions are typically attributed to the store and recall of feature-maps. There is therefore a need to explicitly reduce the power dissipation of these transactions whilst minimising any overheads needed to do so. In this work, a Differential Encoding of Feature-maps (DEF) scheme is proposed, which aims at minimising activity on the memory data bus, specifically for CNN workloads. The coding scheme uses domain-specific knowledge, exploiting statistics of feature-maps alongside knowledge of the data types commonly used in machine learning accelerators as a means of reducing power consumption. DEF is able to out-perform recent state-of-the-art coding schemes, with significantly less overhead, achieving up to 50% reduction of activity across a number of modern CNNs.

References

[1]

Michaela Blott, Thomas B. Preußer, Nicholas J. Fraser, Giulio Gambardella, Kenneth O'brien, Yaman Umuroglu, Miriam Leeser, and Kees Vissers. 2018. FINN-R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks. ACM Transactions on Reconfigurable Technology and Systems 11, 3 (Dec. 2018).

Digital Library

[2]

Y. Chen, J. Emer, and V. Sze. 2016. Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[3]

Yijin Guan, Hao Liang, Ningyi Xu, Wenqiang Wang, Shaoshuai Shi, Xi Chen, Guangyu Sun, Wei Zhang, and Jason Cong. 2017. FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates. In 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, Napa, CA, USA.

[4]

P. Gysel, J. Pimentel, M. Motamedi, and S. Ghiasi. 2018. Ristretto: A Framework for Empirical Study of Resource-Efficient Inference in Convolutional Neural Networks. IEEE Transactions on Neural Networks and Learning Systems 29, 11 (Nov. 2018).

[5]

Mark Horowitz. 2014. 1.1 Computing's Energy Problem (and What We Can Do about It). In 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC). IEEE, San Francisco, CA, USA.

[6]

Chao-Tsung Huang, Yu-Chun Ding, Huan-Ching Wang, Chi-Wen Weng, Kai-Ping Lin, Li-Wei Wang, and Li-De Chen. 2019. eCNN: A Block-Based and Highly-Parallel CNN Accelerator for Edge Inference. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '52). Association for Computing Machinery, New York, NY, USA.

Digital Library

[7]

Claudia Kretzschmar, Robert Siegmund, and Dietmar Müller. 2003. Low Power Encoding Techniques for Dynamically Reconfigurable Hardware. The Journal of Supercomputing 26, 2 (Sept. 2003).

Digital Library

[8]

E. Maragkoudaki, P. Mroszczyk, and V. F. Pavlidis. 2019. Adaptive Word Reordering for Low-Power Inter-Chip Communication. In 2019 Design, Automation Test in Europe Conference Exhibition (DATE).

[9]

A. Montgomerie-Corcoran, S. I. Venieris, and C. Bouganis. 2019. Power-Aware FPGA Mapping of Convolutional Neural Networks. In 2019 International Conference on Field-Programmable Technology (ICFPT).

[10]

S. Ramprasad, N.R. Shanbha, and I.N. Hajj. 1997. Analytical Estimation of Signal Transition Activity from Word-Level Statistics. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 16, 7 (July 1997).

Digital Library

[11]

S. Ramprasad, N.R. Shanbhag, and I.N. Hajj. 1999. A Coding Framework for Low-Power Address and Data Busses. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 7, 2 (June 1999).

Digital Library

[12]

S. Ramprasad, N. R. Shanbhag, and I. N. Hajj. 1999. Information-Theoretic Bounds on Average Signal Transition Activity [VLSI Systems]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 7, 3 (Sept. 1999).

Digital Library

[13]

B. Reagen, P. Whatmough, R. Adolf, S. Rama, H. Lee, S. K. Lee, J. M. Hernández-Lobato, G. Wei, and D. Brooks. 2016. Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[14]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision 115, 3 (Dec. 2015).

Digital Library

[15]

Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

[16]

S. Sarkar, A. Biswas, A. S. Dhar, and R. M. Rao. 2017. Adaptive Bus Encoding for Transition Reduction on Off-Chip Buses With Dynamically Varying Switching Characteristics. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 25, 11 (Nov. 2017).

[17]

M. R. Stan and W. P. Burleson. 1995. Bus-Invert Coding for Low-Power I/O. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 3, 1 (March 1995).

Digital Library

[18]

M. R. Stan and W. P. Burleson. 1997. Low-Power Encodings for Global Communication in CMOS VLSI. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 5, 4 (Dec. 1997).

Digital Library

[19]

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going Deeper With Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

[20]

S. I. Venieris and C. Bouganis. 2016. fpgaConvNet: A Framework for Mapping Convolutional Neural Networks on FPGAs. In 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[21]

X. Wang, Y. Han, V. C. M. Leung, D. Niyato, X. Yan, and X. Chen. Secondquarter 2020. Convergence of Edge Computing and Deep Learning: A Comprehensive Survey. IEEE Communications Surveys Tutorials 22, 2 (Secondquarter 2020).

[22]

Neta Zmora, Guy Jacob, Lev Zlotnik, Bar Elharar, and Gal Novik. 2019. Neural Network Distiller: A Python Package For DNN Compression Research. arXiv:1910.12232 [cs, stat] (Oct. 2019). arXiv:1910.12232 [cs, stat]

Cited By

Park JShin JKim RAn SLee SKim JOh JJeong YKim SJeong YLee S(2024)Accelerating Strawberry Ripeness Classification Using a Convolution-Based Feature Extractor along with an Edge AI ProcessorElectronics10.3390/electronics1302034413:2(344)Online publication date: 13-Jan-2024
https://doi.org/10.3390/electronics13020344
Venieris SFernandez-Marques JLane N(2023)Mitigating Memory Wall Effects in CNN Engines with On-the-Fly Weights GenerationACM Transactions on Design Automation of Electronic Systems10.1145/361167328:6(1-31)Online publication date: 16-Oct-2023
https://dl.acm.org/doi/10.1145/3611673
Venieris SBouganis CLane N(2023)Multiple-Deep Neural Network Accelerators for Next-Generation Artificial Intelligence SystemsComputer10.1109/MC.2022.317684556:3(70-79)Online publication date: Mar-2023
https://doi.org/10.1109/MC.2022.3176845
Show More Cited By

Recommendations

Static energy reduction by performance linked cache capacity management in tiled CMPs
SAC '15: Proceedings of the 30th Annual ACM Symposium on Applied Computing

With the rapid growth in semiconductor technology, modern processor chips have multiple number of processor cores with multi-level on-chip caches. Recent study about the chip power consumption indicates that, the principal amount of chip power is ...
DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning
ASPLOS '14

Machine-Learning tasks are becoming pervasive in a broad range of domains, and in a broad range of systems (from embedded systems to data centers). At the same time, a small set of machine-learning algorithms (especially Convolutional and Deep Neural ...
Superneurons: dynamic GPU memory management for training deep neural networks
PPoPP '18: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Going deeper and wider in neural architectures improves their accuracy, while the limited GPU DRAM places an undesired restriction on the network design domain. Deep Learning (DL) practitioners either need to change to less desired network architectures,...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ASPDAC '21: Proceedings of the 26th Asia and South Pacific Design Automation Conference

January 2021

930 pages

ISBN:9781450379991

DOI:10.1145/3394885

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGDA: ACM Special Interest Group on Design Automation
IEEE CAS
IEEE CEDA

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 January 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Engineering and Physical Sciences Research Council

Conference

ASPDAC '21

Sponsor:

SIGDA

ASPDAC '21: 26th Asia and South Pacific Design Automation Conference

January 18 - 21, 2021

Tokyo, Japan

Acceptance Rates

ASPDAC '21 Paper Acceptance Rate 111 of 368 submissions, 30%;

Overall Acceptance Rate 466 of 1,454 submissions, 32%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
100
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)0

Reflects downloads up to 18 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Park JShin JKim RAn SLee SKim JOh JJeong YKim SJeong YLee S(2024)Accelerating Strawberry Ripeness Classification Using a Convolution-Based Feature Extractor along with an Edge AI ProcessorElectronics10.3390/electronics1302034413:2(344)Online publication date: 13-Jan-2024
https://doi.org/10.3390/electronics13020344
Venieris SFernandez-Marques JLane N(2023)Mitigating Memory Wall Effects in CNN Engines with On-the-Fly Weights GenerationACM Transactions on Design Automation of Electronic Systems10.1145/361167328:6(1-31)Online publication date: 16-Oct-2023
https://dl.acm.org/doi/10.1145/3611673
Venieris SBouganis CLane N(2023)Multiple-Deep Neural Network Accelerators for Next-Generation Artificial Intelligence SystemsComputer10.1109/MC.2022.317684556:3(70-79)Online publication date: Mar-2023
https://doi.org/10.1109/MC.2022.3176845
Yu ZBouganis C(2021)StreamSVD: Low-rank Approximation and Streaming Accelerator Co-design2021 International Conference on Field-Programmable Technology (ICFPT)10.1109/ICFPT52863.2021.9609813(1-9)Online publication date: 6-Dec-2021
https://doi.org/10.1109/ICFPT52863.2021.9609813

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents