[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2830772.2830798acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

Execution time prediction for energy-efficient hardware accelerators

Published: 05 December 2015 Publication History

Abstract

Many mobile applications utilize hardware accelerators for computation-intensive tasks. Often these tasks involve real-time user interactions and must finish within a certain amount of time for smooth user experience. In this paper, we propose a DVFS framework for hardware accelerators involving real-time user interactions. The framework automatically generates a predictor for each accelerator that predicts its execution time, and sets a DVFS level to just meet the response time requirement. Our evaluation results show, compared to running each accelerator at a constant frequency, our DVFS framework achieves 36.7% energy savings on average across a set of accelerators, while only missing 0.4% of the deadlines. The energy savings are only 3.8% less than an optimal DVFS scheme. We show with the introduction of a boost level, the deadline misses can be completely eliminated while still achieving 36.4% energy savings.

References

[1]
"Qualcomm Snapdragon 800 Product Brief." https://www.qualcomm.com/documents/snapdragon-800-processor-product-brief.
[2]
"Inside the Apple A7 from the iPhone 5s." http://www.chipworks.com/en/technical-competitive-analysis/resources/blog/inside-the-a7/.
[3]
"Chipworks Disassembles Apple's A8 SoC." http://www.anandtech.com/show/8562/chipworks-a8.
[4]
C. Ju, T. Liu, K. Lee, Y. Chang, H. Chou, C. Wang, T. Wu, H. Lin, Y. Huang, C. Cheng, T. Lin, C. Chen, Y. Lin, M. Chiu, W. Li, S. Wang, Y. Lai, P. Chao, C. Chien, M. Hu, P. Wang, F. Yeh, Y. Huang, S. Chuang, L. Chen, H. Lin, M. Wu, C. Chen, R. Chen, H. Y. Hsu, and K. Jou, "A 0.5nJ/Pixel 4K H.265/HEVC Codec LSI for Multi-Format Smartphone Applications," in 2015 IEEE Int'l Solid-State Circuits Conf. (ISSCC), 2015.
[5]
"Cortex-A7 Processor." http://www.arm.com/products/processors/cortex-a/cortex-a7.php.
[6]
J. Park, I. Hong, G. Kim, Y. Kim, K. Lee, S. Park, K. Bong, and H. Yoo, "A 646GOPS/W Multi-Classiffer Many-Core Processor with Cortex-Like Architecture for Super-Resolution Recognition," in 2013 IEEE Int'l Solid-State Circuits Conf. (ISSCC), 2013.
[7]
G. Kim, Y. Kim, K. Lee, S. Park, I. Hong, K. Bong, D. Shin, S. Choi, J. Oh, and H.-J. Yoo, "A 1.22TOPS and 1.52mW/MHz Augmented Reality Multi-Core Processor with Neural Network NoC for HMD Applications," in 2014 IEEE Int'l Solid-State Circuits Conf. (ISSCC), 2014.
[8]
A. Putnam, A. M. Caulfield, E. S. Chung, D. Chiou, K. Constantinides, J. Demme, H. Esmaeilzadeh, J. Fowers, G. P. Gopal, J. Gray, M. Haselman, S. Hauck, S. Heil, A. Hormati, J. Kim, S. Lanka, J. R. Larus, E. Peterson, S. Pope, A. Smith, J. Thong, P. Y. Xiao, and D. Burger, "A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services," in Proc. the ACM/IEEE 41st Int'l Symp. on Computer Architecture (ISCA), 2014.
[9]
D. Brodowski, "CPU Frequency and Voltage Scaling Code in the Linux™Kernel." https://www.kernel.org/doc/Documentation/cpu-freq/governors.txt.
[10]
K. Choi, K. Dantu, W.-C. Cheng, and M. Pedram, "Frame-Based Dynamic Voltage and Frequency Scaling for a MPEG Decoder," in Proc. the 2002 IEEE/ACM Int'l Conf. on Computer-aided Design (ICCAD), 2002.
[11]
Y. Gu and S. Chakraborty, "Control Theory-based DVS for Interactive 3D Games," in Proc. the 45th Annual Design Automation Conf. (DAC), 2008.
[12]
D. Lo, L. Cheng, R. Govindaraju, L. A. Barroso, and C. Kozyrakis, "Towards Energy Proportionality for Large-scale Latency-critical Workloads," in Proc. the ACM/IEEE 41st Int'l Symp. on Computer Architecture (ISCA), 2014.
[13]
V. Petrucci, M. A. Laurenzano, J. Doherty, Y. Zhang, D. Mossé, J. Mars, and L. Tang, "Octopus-Man: QoS-Driven Task Management for Heterogeneous Multicores in Warehouse-Scale Computers," in Proc. the 21st IEEE Int'l Symp. on High Performance Computer Architecture (HPCA), 2015.
[14]
Y. Gu and S. Chakraborty, "A Hybrid DVS Scheme for Interactive 3D Games," in 14th IEEE Real-Time and Embedded Technology and Applications Symp. (RTAS), 2008.
[15]
Y. Zhu and V. J. Reddi, "High-performance and Energy-efficient Mobile Web Browsing on Big/Little Systems," in Proc. the 19th IEEE Int'l Symp. on High-Performance Computer Architecture (HPCA), 2013.
[16]
Y. Zhu, M. Halpern, and V. J. Reddi, "Event-Based Scheduling for Energy-Efficient QoS (eQoS) in Mobile Web Applications," in Proc. the 21st IEEE Int'l Symp. on High Performance Computer Architecture (HPCA), 2015.
[17]
C.-H. Hsu, Y. Zhang, M. A. Laurenzano, D. Meisner, T. Wenisch, J. Mars, L. Tang, and R. G. Dreslinski, "Adrenaline: Pinpointing and Reining in Tail Queries with Quick Voltage Boosting," in Proc. the 21st IEEE Int'l Symp. on High Performance Computer Architecture (HPCA), 2015.
[18]
N. C. Nachiappan, P. Yedlapalli, N. Soundararajan, A. Sivasubramaniam, M. T. Kandemir, R. Iyer, and C. R. Das, "Domain Knowledge Based Energy Management in Handhelds," in Proc. the 21st IEEE Int'l Symp. on High Performance Computer Architecture (HPCA), 2015.
[19]
M. Weiser, B. Welch, A. Demers, and S. Shenker, "Scheduling for Reduced CPU Energy," in Proc. the 1st USENIX Conf. on Operating Systems Design and Implementation (OSDI), 1994.
[20]
"Samsung Exynos Linux Kernel Drivers." https://github.com/hardkernel/linux/blob/odroidxu3-3.10.y/arch/arm/mach-exynos/include/mach/exynos-mfc.h.
[21]
V. Sze, D. F. Finchelstein, M. E. Sinangil, and A. P. Chandrakasan, "A 0.7-V 1.8-mW H.264/AVC 720p Video Decoder," IEEE Journal of Solid-State Circuits, vol. 44, no. 11, pp. 2943--2956, 2009.
[22]
M. Roitzsch, S. Wächtler, and H. Härtig, "ATLAS: Look-Ahead Scheduling Using Workload Metrics," in Proc. the 19th IEEE Real-Time and Embedded Technology and Applications Symp. (RTAS), 2013.
[23]
C. Wolf, "Yosys Open SYnthesis Suite." http://www.clifford.at/yosys/.
[24]
Y. Shi, C. W. Ting, B. Gwee, and Y. Ren, "A Highly Efficient Method for Extracting FSMs from Flattened Gate-level Netlist," in Int'l Symp. on Circuits and Systems (ISCAS), 2010.
[25]
R. Tibshirani, "Regression Shrinkage and Selection via the Lasso," Journal of the Royal Statistical Society. Series B (Methodological), 1996.
[26]
E. Clarke, M. Fujita, S. Rajan, T. Reps, S. Shankar, and T. Teitelbaum, "Program Slicing of Hardware Description Languages," in Conf. on Correct Hardware Design and Verification Methods, 1999.
[27]
R. Miftakhutdinov, E. Ebrahimi, and Y. N. Patt, "Predicting Performance Impact of DVFS for Realistic Memory Systems," in Proc. the 45th Annual IEEE/ACM Int'l Symp. on Microarchitecture (MICRO), 2012.
[28]
K. Xu and C. Choy, "Low-power H.264/AVC Baseline Decoder for Portable Applications," in Proc. the 2007 Int'l Symp. on Low Power Electronics and Design, 2007.
[29]
W. Godycki, C. Torng, I. Bukreyev, A. Apsel, and C. Batten, "Enabling Realistic Fine-Grain Voltage Scaling with Reconfigurable Power Distribution Networks," in Proc. the 47th Int'l Symp. on Microarchitecture (MICRO), 2014.
[30]
A. Beldachi and J. L. Núñez-Yáñez, "Run-time Power and Performance Scaling in 28 nm FPGAs," IET Computers & Digital Techniques, vol. 8, no. 4, pp. 178--186, 2014.
[31]
"Video compression systems." http://opencores.org/project,video_systems.
[32]
"JPEG Decoder in Verilog." http://opencores.org/project,djpeg.
[33]
B. Reagen, R. Adolf, Y. S. Shao, G. Wei, and D. M. Brooks, "MachSuite: Benchmarks for Accelerator Design and Customized Architectures," in 2014 IEEE Int'l Symp. on Workload Characterization (IISWC), 2014.
[34]
"AES (Rijndael) IP Core." http://opencores.org/project,aes_core.
[35]
"SHA cores." http://opencores.org/project,sha_core.
[36]
W. Kim, M. Gupta, G.-Y. Wei, and D. Brooks, "System Level Analysis of Fast, Per-Core DVFS using On-chip Switching Regulators," in Proc. the IEEE 14th Int'l Symp. on High Performance Computer Architecture (HPCA), 2008.
[37]
M. Weiser, "Program slicing," in Proc. the 5th Int'l Conf. on Software Engineering, pp. 439--449, 1981.
[38]
D. Shin, J. Kim, and S. Lee, "Low-energy Intra-task Voltage Scheduling Using Static Timing Analysis," in Proc. the 38th Annual Design Automation Conf. (DAC), 2001.
[39]
M. Ham, "devfreq: Generic Dynamic Voltage and Frequency Scaling (DVFS) Framework for Non-CPU Devices." https://github.com/torvalds/linux/tree/master/drivers/devfreq.
[40]
P. P. Puschner and A. Burns, "Guest Editorial: A Review of Worst-Case Execution-Time Analysis," Real-Time Systems, vol. 18, no. 2, pp. 115--128, 2000.
[41]
Y. Kwon, S. Lee, H. Yi, D. Kwon, S. Yang, B.-G. Chun, L. Huang, P. Maniatis, M. Naik, and Y. Paek, "Mantis: Automatic Performance Prediction for Smartphone Applications," in Proc. the 2013 USENIX Annual Technical Conf., 2013.

Cited By

View all
  • (2023)FARSI: An Early-stage Design Space Exploration Framework to Tame the Domain-specific System-on-chip ComplexityACM Transactions on Embedded Computing Systems10.1145/354401622:2(1-35)Online publication date: 24-Jan-2023
  • (2022)SHAPEProceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design10.1145/3508352.3549409(1-9)Online publication date: 30-Oct-2022
  • (2020)Deep LearningDeep Learning Techniques and Optimization Strategies in Big Data Analytics10.4018/978-1-7998-1192-3.ch008(124-141)Online publication date: 2020
  • Show More Cited By

Index Terms

  1. Execution time prediction for energy-efficient hardware accelerators

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MICRO-48: Proceedings of the 48th International Symposium on Microarchitecture
    December 2015
    787 pages
    ISBN:9781450340342
    DOI:10.1145/2830772
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 05 December 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. DVFS
    2. energy efficiency
    3. hardware accelerator

    Qualifiers

    • Research-article

    Funding Sources

    • Office of Naval Research (ONR)

    Conference

    MICRO-48
    Sponsor:

    Acceptance Rates

    MICRO-48 Paper Acceptance Rate 61 of 283 submissions, 22%;
    Overall Acceptance Rate 484 of 2,242 submissions, 22%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)26
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 10 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)FARSI: An Early-stage Design Space Exploration Framework to Tame the Domain-specific System-on-chip ComplexityACM Transactions on Embedded Computing Systems10.1145/354401622:2(1-35)Online publication date: 24-Jan-2023
    • (2022)SHAPEProceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design10.1145/3508352.3549409(1-9)Online publication date: 30-Oct-2022
    • (2020)Deep LearningDeep Learning Techniques and Optimization Strategies in Big Data Analytics10.4018/978-1-7998-1192-3.ch008(124-141)Online publication date: 2020
    • (2019)Digital Signal Processing Accelerator for RISC-V2019 26th IEEE International Conference on Electronics, Circuits and Systems (ICECS)10.1109/ICECS46596.2019.8964670(703-706)Online publication date: Nov-2019
    • (2018)A QOS-aware dynamic resources management for data centerProceedings of the International Conference on Geoinformatics and Data Analysis10.1145/3220228.3220251(157-162)Online publication date: 20-Apr-2018
    • (2018)Active forwardingProceedings of the 55th Annual Design Automation Conference10.1145/3195970.3195984(1-6)Online publication date: 24-Jun-2018
    • (2018)Mobilizing the micro-opsProceedings of the 45th Annual International Symposium on Computer Architecture10.1109/ISCA.2018.00058(624-637)Online publication date: 2-Jun-2018
    • (2018)Active Forwarding: Eliminate IOMMU Address Translation for Accelerator-rich Architectures2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)10.1109/DAC.2018.8465921(1-6)Online publication date: Jun-2018
    • (2017)Enhancing Energy Efficiency of Multimedia Applications in Heterogeneous Mobile Multi-Core ProcessorsIEEE Transactions on Computers10.1109/TC.2017.271031766:11(1878-1889)Online publication date: 5-Oct-2017
    • (2017)Dynamic GPGPU Power Management Using Adaptive Model Predictive Control2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2017.34(613-624)Online publication date: Feb-2017
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media