More Web Proxy on the site http://driver.im/

research-article

Open access

EDeN: Enabling Low-Power CNN Inference on Edge Devices Using Prefetcher-assisted NVM Systems

Authors:

Hyun KimAuthors Info & Claims

ISLPED '24: Proceedings of the 29th ACM/IEEE International Symposium on Low Power Electronics and Design

Pages 1 - 6

https://doi.org/10.1145/3665314.3670801

Published: 09 September 2024 Publication History

Abstract

The accuracy of convolutional neural networks (CNNs) has significantly improved over the years. Meanwhile, due to the high portability and usefulness of edge devices, the demand for artificial intelligence (AI) based applications on edge computing devices has been soaring recently. Accordingly, CNN inference has become one of the mainstream AI applications on edge devices. However, the continually increasing leakage power of edge devices drags down the wide deployment of CNN inference applications, as the technology node scales down.

In this work, we focus on reducing the power consumption in main memory, which consumes considerable power in CNN inference. Particularly, we observed that the idle state of memory is dominant in computationally intensive CNN inference. To achieve low-power CNN inference on edge devices, we first utilize next-generation nonvolatile memory (NVM) as the main memory device rather than dynamic random-access memory (DRAM) only for CNN inference tasks. To mitigate the increased latency caused by NVM, we propose a novel prefetcher that smartly leverages existing resources in commercial NVM system models; it is designed to predictably manage the locality-specific demands of CNN models while smartly leveraging existing resources in a modern NVM system. Furthermore, utilizing a prefetcher-based approach, we optimize the write allocation to enhance the data reuse and energy efficiency in CNN workloads. Based on simulation, our design improves the energy efficiency by 50% with a negligible impact on the performance compared with conventional DRAM-based platforms.

References

[1]

Amna Abdullah et al. 2015. Real time wireless health monitoring application using mobile devices. International Journal of Computer Networks & Communications 7, 3 (2015), 13--30.

[2]

Jung Ho Ahn et al. 2013. McSimA+: A manycore simulator with application-level+ simulation and detailed microarchitecture modeling. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 74--85.

[3]

Arm Holdings plc. 2007. Arm Cortex A9. https://developer.arm.com/Processors/Cortex-A9#Technical-Specifications

[4]

Jean-Loup Baer et al. 1991. An effective on-chip preloading scheme to reduce data access penalty. In ACM/IEEE Conf. Supercomputing (SC). 176--186.

[5]

Rahul Bera et al. 2021. Pythia: A customizable hardware prefetching framework using online reinforcement learning. In IEEE/ACM International Symposium on Microarchitecture (MICRO). 1121--1137.

[6]

Keyan Cao et al. 2020. An overview on edge computing research. IEEE Access 8 (2020), 85714--85728.

[7]

Jiwoong Choi et al. 2019. Gaussian yolov3: An accurate and fast object detector using localization uncertainty for autonomous driving. In IEEE/CVF International Conference on Computer Vision (ICCV). 502--511.

[8]

Youngdon Choi et al. 2012. A 20nm 1.8 V 8Gb PRAM with 40MB/s program bandwidth. In International Solid-State Circuits Conference (ISSCC). 46--48.

[9]

Palash Das et al. 2022. Hydra: A near hybrid memory accelerator for CNN inference. In Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1017--1022.

[10]

Eiman Ebrahimi et al. 2009. Coordinated control of multiple prefetchers in multi-core systems. In IEEE/ACM International Symposium on Microarchitecture (MICRO). 316--326.

[11]

Eiman Ebrahimi et al. 2011. Prefetch-aware shared resource management for multi-core systems. ACM SIGARCH Computer Architecture News 39, 3 (2011), 141--152.

Digital Library

[12]

Milad Hashemi et al. 2018. Learning memoryaccess patterns. In IEEE International Conference on Machine Learning (ICML). PMLR, 1919--1928.

[13]

Izraelevitz et al. 2019. Basic performance measurements of the intel optane DC persistent memory module. arXiv preprint arXiv:1903.05714 (2019).

[14]

Syed MAH Jafri et al. 2020. Refresh triggered computation: Improving the energy efficiency of convolutional neural network accelerators. ACM Trans. Arch. and Code Optimization 18, 1 (2020), 1--29.

[15]

Jihoon Jang et al. 2023. A Spatio-Temporal Switchable Data Prefetcher for Convolutional Neural Networks. In International SoC Design Conference (ISOCC). 141--142.

[16]

Subin Ki and otehrs. 2023. Dedicated FPGA Implementation of the Gaussian TinyYOLOv3 Accelerator. IEEE Transactions on Circuits and Systems II: Express Briefs (2023).

[17]

Jinchun Kim et al. 2016. Path confidence based lookahead prefetching. In IEEE/ACM International Symposium on Microarchitecture (MICRO). 1--12.

[18]

Nam Joon Kim and Hyun Kim. 2023. FP-AGL: Filter pruning with adaptive gradient learning for accelerating deep convolutional neural networks. IEEE Transactions on Multimedia 25 (2023), 5279--5290.

Digital Library

[19]

Benjamin C Lee et al. 2009. Architecting phase change memory as a scalable dram alternative. In ACM/IEEE International Symposium on Computer Architecture (ISCA). 2--13.

[20]

Hyokeun Lee et al. 2019. Integration and boost of a read-modify-write module in phase change memory system. IEEE Trans. Comput. 68, 12 (2019), 1772--1784.

[21]

Hyokeun Lee et al. 2022. Pcmcsim: An accurate phase-change memory controller simulator and its performance analysis. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 300--310.

[22]

Seung Il Lee and Hyun Kim. 2022. GaussianMask: Uncertainty-aware Instance Segmentation based on Gaussian Modeling. In International Conference on Pattern Recognition (ICPR). 3851--3857.

[23]

Yisheng Lv et al. 2014. Traffic flow prediction with big data: A deep learning approach. IEEE Transactions on Intelligent Transportation Systems 16, 2 (2014), 865--873.

[24]

Micron Technology, Inc. 2016. 4GB (x64, SR) 260-Pin DDR4 SODIMM Data Sheet. https://www.micron.com/products/memory/dram-modules/sodimm/part-catalog/part-detail/mta4atf51264hz-3g2r1

[25]

Micron Technology, Inc. 2018. 16Gb: x4, x8, x16 DDR4 SDRAM Data Sheet. https://www.micron.com/products/memory/dram-components/ddr4-sdram/part-catalog/part-detail/mt40a1g16tb-062e-f

[26]

Micron Technology, Inc. 2019. 32GB (x64, DR) 260-Pin DDR4 SODIMM Data Sheet. https://www.micron.com/products/memory/dram-modules/sodimm/part-catalog/part-detail/mta16atf4g64hz-3g2e2

[27]

Haiyang Pan et al. 2021. Lsp: Collective cross-page prefetching for nvm. In Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 501--506.

[28]

HyeonJung Park et al. 2021. Enabling real-time sign language translation on mobile platforms with on-board depth cameras. ACM Interactive, Mobile, Wearable and Ubiquitous Technologies 5, 2 (2021), 1--30.

Digital Library

[29]

Moinuddin K Qureshi et al. 2009. Scalable high performance main memory system using phase-change memory technology. In ACM/IEEE International Symposium on Computer Architecture (ISCA). 24--33.

[30]

Joseph Redmon. 2013--2016. Darknet: Open Source Neural Networks in C. http://pjreddie.com/darknet/.

[31]

Manjunath Shevgoor et al. 2015. Efficiently prefetching complex address patterns. In IEEE/ACM International Symposium on Microarchitecture (MICRO). 141--152.

[32]

Zhan Shi et al. 2021. A hierarchical neural model of data prefetching. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 861--873.

[33]

Lokesh Siddhu et al. 2023. Swift-CNN: Leveraging PCM Memory's Fast Write Mode to Accelerate CNNs. IEEE Embedded Systems Letters (2023).

[34]

Jian Yang et al. 2020. An empirical guide to the behavior and use of scalable persistent memory. In ACM USENIX Conference on File and Storage Technologies (FAST). 169--182.

Index Terms

EDeN: Enabling Low-Power CNN Inference on Edge Devices Using Prefetcher-assisted NVM Systems

Recommendations

Ensuring consistent recovery under power failure with minimal NVM write overhead
Abstract
Intermittent embedded devices and systems are widely used in various scenarios, but they often experience power failures due to unstable power supplies. Non-volatile memory (NVM) is gaining popularity in embedded systems due to its byte-...
Low-power content addressable memory (CAM) array for mobile devices

Large-capacity content-addressable memory (CAM) is beneficial in a variety of applications that require high-speed lookup table. It is used extensively in low power CPU design, network routers, and cache controllers. Content addressable memory system ...
Hardware/software cooperative caching for hybrid DRAM/NVM memory architectures
ICS '17: Proceedings of the International Conference on Supercomputing

Non-Volatile Memory (NVM) has recently emerged for its nonvolatility, high density and energy efficiency. Hybrid memory systems composed of DRAM and NVM have the best of both worlds, because NVM can offer larger capacity and have near-zero standby power ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ISLPED '24: Proceedings of the 29th ACM/IEEE International Symposium on Low Power Electronics and Design

August 2024

384 pages

ISBN:9798400706882

DOI:10.1145/3665314

Chair:
Pascal Meinerzhagen,
Program Chair:
Kapil Dev,
Program Co-chair:
Jerald Yoo

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGDA: ACM Special Interest Group on Design Automation
IEEE CAS
IEEE EDA

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 September 2024

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ISLPED '24

Sponsor:

SIGDA

ISLPED '24: 29th ACM/IEEE International Symposium on Low Power Electronics and Design

August 5 - 7, 2024

CA, Newport Beach, USA

Acceptance Rates

Overall Acceptance Rate 398 of 1,159 submissions, 34%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
103
Total Downloads

Downloads (Last 12 months)103
Downloads (Last 6 weeks)43

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents