[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3665314.3670801acmconferencesArticle/Chapter ViewAbstractPublication PagesislpedConference Proceedingsconference-collections
research-article
Open access

EDeN: Enabling Low-Power CNN Inference on Edge Devices Using Prefetcher-assisted NVM Systems

Published: 09 September 2024 Publication History

Abstract

The accuracy of convolutional neural networks (CNNs) has significantly improved over the years. Meanwhile, due to the high portability and usefulness of edge devices, the demand for artificial intelligence (AI) based applications on edge computing devices has been soaring recently. Accordingly, CNN inference has become one of the mainstream AI applications on edge devices. However, the continually increasing leakage power of edge devices drags down the wide deployment of CNN inference applications, as the technology node scales down.
In this work, we focus on reducing the power consumption in main memory, which consumes considerable power in CNN inference. Particularly, we observed that the idle state of memory is dominant in computationally intensive CNN inference. To achieve low-power CNN inference on edge devices, we first utilize next-generation nonvolatile memory (NVM) as the main memory device rather than dynamic random-access memory (DRAM) only for CNN inference tasks. To mitigate the increased latency caused by NVM, we propose a novel prefetcher that smartly leverages existing resources in commercial NVM system models; it is designed to predictably manage the locality-specific demands of CNN models while smartly leveraging existing resources in a modern NVM system. Furthermore, utilizing a prefetcher-based approach, we optimize the write allocation to enhance the data reuse and energy efficiency in CNN workloads. Based on simulation, our design improves the energy efficiency by 50% with a negligible impact on the performance compared with conventional DRAM-based platforms.

References

[1]
Amna Abdullah et al. 2015. Real time wireless health monitoring application using mobile devices. International Journal of Computer Networks & Communications 7, 3 (2015), 13--30.
[2]
Jung Ho Ahn et al. 2013. McSimA+: A manycore simulator with application-level+ simulation and detailed microarchitecture modeling. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 74--85.
[3]
Arm Holdings plc. 2007. Arm Cortex A9. https://developer.arm.com/Processors/Cortex-A9#Technical-Specifications
[4]
Jean-Loup Baer et al. 1991. An effective on-chip preloading scheme to reduce data access penalty. In ACM/IEEE Conf. Supercomputing (SC). 176--186.
[5]
Rahul Bera et al. 2021. Pythia: A customizable hardware prefetching framework using online reinforcement learning. In IEEE/ACM International Symposium on Microarchitecture (MICRO). 1121--1137.
[6]
Keyan Cao et al. 2020. An overview on edge computing research. IEEE Access 8 (2020), 85714--85728.
[7]
Jiwoong Choi et al. 2019. Gaussian yolov3: An accurate and fast object detector using localization uncertainty for autonomous driving. In IEEE/CVF International Conference on Computer Vision (ICCV). 502--511.
[8]
Youngdon Choi et al. 2012. A 20nm 1.8 V 8Gb PRAM with 40MB/s program bandwidth. In International Solid-State Circuits Conference (ISSCC). 46--48.
[9]
Palash Das et al. 2022. Hydra: A near hybrid memory accelerator for CNN inference. In Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1017--1022.
[10]
Eiman Ebrahimi et al. 2009. Coordinated control of multiple prefetchers in multi-core systems. In IEEE/ACM International Symposium on Microarchitecture (MICRO). 316--326.
[11]
Eiman Ebrahimi et al. 2011. Prefetch-aware shared resource management for multi-core systems. ACM SIGARCH Computer Architecture News 39, 3 (2011), 141--152.
[12]
Milad Hashemi et al. 2018. Learning memoryaccess patterns. In IEEE International Conference on Machine Learning (ICML). PMLR, 1919--1928.
[13]
Izraelevitz et al. 2019. Basic performance measurements of the intel optane DC persistent memory module. arXiv preprint arXiv:1903.05714 (2019).
[14]
Syed MAH Jafri et al. 2020. Refresh triggered computation: Improving the energy efficiency of convolutional neural network accelerators. ACM Trans. Arch. and Code Optimization 18, 1 (2020), 1--29.
[15]
Jihoon Jang et al. 2023. A Spatio-Temporal Switchable Data Prefetcher for Convolutional Neural Networks. In International SoC Design Conference (ISOCC). 141--142.
[16]
Subin Ki and otehrs. 2023. Dedicated FPGA Implementation of the Gaussian TinyYOLOv3 Accelerator. IEEE Transactions on Circuits and Systems II: Express Briefs (2023).
[17]
Jinchun Kim et al. 2016. Path confidence based lookahead prefetching. In IEEE/ACM International Symposium on Microarchitecture (MICRO). 1--12.
[18]
Nam Joon Kim and Hyun Kim. 2023. FP-AGL: Filter pruning with adaptive gradient learning for accelerating deep convolutional neural networks. IEEE Transactions on Multimedia 25 (2023), 5279--5290.
[19]
Benjamin C Lee et al. 2009. Architecting phase change memory as a scalable dram alternative. In ACM/IEEE International Symposium on Computer Architecture (ISCA). 2--13.
[20]
Hyokeun Lee et al. 2019. Integration and boost of a read-modify-write module in phase change memory system. IEEE Trans. Comput. 68, 12 (2019), 1772--1784.
[21]
Hyokeun Lee et al. 2022. Pcmcsim: An accurate phase-change memory controller simulator and its performance analysis. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 300--310.
[22]
Seung Il Lee and Hyun Kim. 2022. GaussianMask: Uncertainty-aware Instance Segmentation based on Gaussian Modeling. In International Conference on Pattern Recognition (ICPR). 3851--3857.
[23]
Yisheng Lv et al. 2014. Traffic flow prediction with big data: A deep learning approach. IEEE Transactions on Intelligent Transportation Systems 16, 2 (2014), 865--873.
[24]
Micron Technology, Inc. 2016. 4GB (x64, SR) 260-Pin DDR4 SODIMM Data Sheet. https://www.micron.com/products/memory/dram-modules/sodimm/part-catalog/part-detail/mta4atf51264hz-3g2r1
[25]
Micron Technology, Inc. 2018. 16Gb: x4, x8, x16 DDR4 SDRAM Data Sheet. https://www.micron.com/products/memory/dram-components/ddr4-sdram/part-catalog/part-detail/mt40a1g16tb-062e-f
[26]
Micron Technology, Inc. 2019. 32GB (x64, DR) 260-Pin DDR4 SODIMM Data Sheet. https://www.micron.com/products/memory/dram-modules/sodimm/part-catalog/part-detail/mta16atf4g64hz-3g2e2
[27]
Haiyang Pan et al. 2021. Lsp: Collective cross-page prefetching for nvm. In Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 501--506.
[28]
HyeonJung Park et al. 2021. Enabling real-time sign language translation on mobile platforms with on-board depth cameras. ACM Interactive, Mobile, Wearable and Ubiquitous Technologies 5, 2 (2021), 1--30.
[29]
Moinuddin K Qureshi et al. 2009. Scalable high performance main memory system using phase-change memory technology. In ACM/IEEE International Symposium on Computer Architecture (ISCA). 24--33.
[30]
Joseph Redmon. 2013--2016. Darknet: Open Source Neural Networks in C. http://pjreddie.com/darknet/.
[31]
Manjunath Shevgoor et al. 2015. Efficiently prefetching complex address patterns. In IEEE/ACM International Symposium on Microarchitecture (MICRO). 141--152.
[32]
Zhan Shi et al. 2021. A hierarchical neural model of data prefetching. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 861--873.
[33]
Lokesh Siddhu et al. 2023. Swift-CNN: Leveraging PCM Memory's Fast Write Mode to Accelerate CNNs. IEEE Embedded Systems Letters (2023).
[34]
Jian Yang et al. 2020. An empirical guide to the behavior and use of scalable persistent memory. In ACM USENIX Conference on File and Storage Technologies (FAST). 169--182.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ISLPED '24: Proceedings of the 29th ACM/IEEE International Symposium on Low Power Electronics and Design
August 2024
384 pages
ISBN:9798400706882
DOI:10.1145/3665314
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 September 2024

Check for updates

Author Tags

  1. convolutional neural network (CNN)
  2. non-volatile memory (NVM)
  3. low power
  4. edge device
  5. prefetcher

Qualifiers

  • Research-article

Conference

ISLPED '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 398 of 1,159 submissions, 34%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 103
    Total Downloads
  • Downloads (Last 12 months)103
  • Downloads (Last 6 weeks)43
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media