[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3627703.3629584acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article
Open access

Improving GPU Energy Efficiency through an Application-transparent Frequency Scaling Policy with Performance Assurance

Published: 22 April 2024 Publication History

Abstract

Power consumption is one of the top limiting factors in high-performance computing systems and data centers, and dynamic voltage and frequency scaling (DVFS) is an important mechanism to control power. Existing works using DVFS to improve GPU energy efficiency suffer from the limitation that their policies either impact performance too much or require offline application profiling or code modification, which severely limits their applicability on large clusters. To address this issue, we propose a novel GPU DVFS policy, GEEPAFS, which improves the energy efficiency of GPUs while providing performance assurance. GEEPAFS is application-transparent as it does not require any offline profiling or code modification on user applications. To achieve this, GEEPAFS models application performance online based on our quantitative analysis of a correlation between performance and GPU memory bandwidth utilization. Based on their relationship, GEEPAFS builds a fold-line frequency-performance model for applications being executed, and it applies the model to guide the setting of GPU frequency to maximize energy efficiency while ensuring the performance loss is bounded. Through experiments on NVIDIA V100 and A100 GPUs, we show that GEEPAFS is able to improve the energy efficiency by 26.7% and 20.2% on average. While achieving this improvement, the average performance loss is only 5.8%, and the worst-case performance loss is 12.5% among all 33 tested applications.

References

[1]
Yuki Abe, Hiroshi Sasaki, Shinpei Kato, Koji Inoue, Masato Edahiro, and Martin Peres. 2014. Power and Performance Characterization and Modeling of GPU-Accelerated Systems. In 2014 IEEE 28th International Parallel and Distributed Processing Symposium. 113--122.
[2]
Ghazanfar Ali, Sridutt Bhalachandra, Nicholas J. Wright, Mert Side, and Yong Chen. 2022. Optimal GPU Frequency Selection using Multi-Objective Approaches for HPC Systems. In 2022 IEEE High Performance Extreme Computing Conference (HPEC). 1--7.
[3]
Enrico Calore, Alessandro Gabbana, Sebastiano Fabio Schifano, and Raffaele Tripiccione. 2017. Evaluation of DVFS techniques on modern HPC processors and accelerators for energy-aware applications. Concurrency and Computation: Practice and Experience 29, 12 (2017), e4143.
[4]
Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, Sang-Ha Lee, and Kevin Skadron. Accessed: 2023-01. Rodinia Benchmark Suite version 3.1. https://rodinia.cs.virginia.edu/doku.php.
[5]
Somdip Dey, Samuel Isuwa, Suman Saha, Amit Kumar Singh, and Klaus McDonald-Maier. 2022. CPU-GPU-Memory DVFS for Power-Efficient MPSoC in Mobile Cyber Physical Systems. Future Internet 14, 3 (2022).
[6]
Somdip Dey, Amit Kumar Singh, Xiaohang Wang, and Klaus McDonald-Maier. 2020. User Interaction Aware Reinforcement Learning for Power and Thermal Efficiency of CPU-GPU Mobile MPSoCs. In 2020 Design, Automation and Test in Europe Conference and Exhibition (DATE). 1728--1733.
[7]
Bishwajit Dutta, Vignesh Adhinarayanan, and Wu-chun Feng. 2018. GPU Power Prediction via Ensemble Machine Learning for DVFS Space Exploration. In Proceedings of the 15th ACM International Conference on Computing Frontiers (Ischia, Italy) (CF '18). Association for Computing Machinery, New York, NY, USA, 240--243.
[8]
Kaijie Fan, Biagio Cosenza, and Ben Juurlink. 2019. Predictable GPUs Frequency Scaling for Energy and Performance. In Proceedings of the 48th International Conference on Parallel Processing (Kyoto, Japan) (ICPP 2019). Association for Computing Machinery, New York, NY, USA, Article 52, 10 pages.
[9]
Scott Grauer-Gray, Lifan Xu, Robert Searles, Sudhee Ayalasomayajula, and John Cavazos. Accessed: 2023-01. PolyBench Benchmarks on GPU. https://github.com/socal-ucr/polybench-gpu.
[10]
Joao Guerreiro, Aleksandar Ilic, Nuno Roma, and Pedro Tomas. 2018. GPGPU Power Modeling for Multi-domain Voltage-Frequency Scaling. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). 789--800.
[11]
João Guerreiro, Aleksandar Ilic, Nuno Roma, and Pedro Tomás. 2015. Multi-kernel Auto-Tuning on GPUs: Performance and Energy-Aware Optimization. In 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing. 438--445.
[12]
João Guerreiro, Aleksandar Ilic, Nuno Roma, and Pedro Tomás. 2019. DVFS-aware application classification to improve GPGPUs energy efficiency. Parallel Comput. 83 (2019), 93--117.
[13]
João Guerreiro, Aleksandar Ilic, Nuno Roma, and Pedro Tomás. 2019. Modeling and Decoupling the GPU Power Consumption for Cross-Domain DVFS. IEEE Transactions on Parallel and Distributed Systems 30, 11 (2019), 2494--2506.
[14]
Shashikant Ilager, Rajeev Muralidhar, Kotagiri Rammohanrao, and Rajkumar Buyya. 2020. A Data-Driven Frequency Scaling Approach for Deadline-aware Energy Efficient Scheduling on Graphics Processing Units (GPUs). In 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID). 579--588.
[15]
Vijay Kandiah, Scott Peverelle, Mahmoud Khairy, Junrui Pan, Amogh Manjunath, Timothy G. Rogers, Tor M. Aamodt, and Nikos Hardavellas. 2021. AccelWattch: A Power Modeling Framework for Modern GPUs. In MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture (Virtual Event, Greece) (MICRO '21). Association for Computing Machinery, New York, NY, USA, 738--753.
[16]
Seyeon Kim, Kyungmin Bin, Sangtae Ha, Kyunghan Lee, and Song Chong. 2021. ZTT: Learning-Based DVFS with Zero Thermal Throttling for Mobile Devices. In Proceedings of the 19th Annual International Conference on Mobile Systems, Applications, and Services (Virtual Event, Wisconsin) (MobiSys '21). Association for Computing Machinery, New York, NY, USA, 41--53.
[17]
Toshiya Komoda, Shingo Hayashi, Takashi Nakada, Shinobu Miwa, and Hiroshi Nakamura. 2013. Power capping of CPU-GPU heterogeneous systems through coordinating DVFS and task mapping. In 2013 IEEE 31st International Conference on Computer Design (ICCD). 349--356.
[18]
Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang. 2012. GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures. In 2012 41st International Conference on Parallel Processing. 48--57.
[19]
Abhinandan Majumdar, Leonardo Piga, Indrani Paul, Joseph L. Greathouse, Wei Huang, and David H. Albonesi. 2017. Dynamic GPGPU Power Management Using Adaptive Model Predictive Control. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). 613--624.
[20]
Eric Masanet, Arman Shehabi, Nuoa Lei, Sarah Smith, and Jonathan Koomey. 2020. Recalibrating global data center energy-use estimates. Science 367, 6481 (2020), 984--986.
[21]
Xinxin Mei, Qiang Wang, and Xiaowen Chu. 2017. A survey and measurement study of GPU DVFS on energy conservation. Digital Communications and Networks 3, 2 (2017), 89--100.
[22]
Xinxin Mei, Ling Sing Yung, Kaiyong Zhao, and Xiaowen Chu. 2013. A Measurement Study of GPU DVFS on Energy Conservation. In Proceedings of the Workshop on Power-Aware Computing and Systems (Farmington, Pennsylvania) (HotPower '13). Association for Computing Machinery, New York, NY, USA, Article 10, 5 pages.
[23]
Francisco Mendes, Pedro Tomás, and Nuno Roma. 2022. Decoupling GPGPU voltage-frequency scaling for deep-learning applications. J. Parallel and Distrib. Comput. 165 (2022), 32--51.
[24]
Meta Research. Accessed: 2023-01. Self-Supervised Vision Transformers with DINO. https://github.com/facebookresearch/dino.
[25]
Seyed Morteza Nabavinejad, Hassan Hafez-Kolahi, and Sherief Reda. 2019. Coordinated DVFS and Precision Control for Deep Neural Networks. IEEE Computer Architecture Letters 18, 2 (2019), 136--140.
[26]
Seyed Morteza Nabavinejad, Sherief Reda, and Masoumeh Ebrahimi. 2022. Coordinated Batching and DVFS for DNN Inference on GPU Accelerators. IEEE Transactions on Parallel and Distributed Systems 33, 10 (2022), 2496--2508.
[27]
Rajib Nath and Dean Tullsen. 2015. The CRISP Performance Model for Dynamic Voltage and Frequency Scaling in a GPGPU. In Proceedings of the 48th International Symposium on Microarchitecture (Waikiki, Hawaii) (MICRO-48). Association for Computing Machinery, New York, NY, USA, 281--293.
[28]
NVIDIA. Accessed: 2023-01. NVIDIA H100 Tensor Core GPU. https://www.nvidia.com/en-us/data-center/h100/.
[29]
NVIDIA Corporation. Accessed: 2023-01. NVIDIA CUDA Code Samples. https://github.com/nvidia/cuda-samples.
[30]
NVIDIA Corporation. Accessed: 2023-01. NVIDIA CUDA Profiling Tools Interface (CUPTI). https://developer.nvidia.com/cupti.
[31]
NVIDIA Corporation. Accessed: 2023-01. NVIDIA Data Center GPU Manager (DCGM). https://developer.nvidia.com/dcgm.
[32]
NVIDIA Corporation. Accessed: 2023-01. NVIDIA Management Library (NVML). https://developer.nvidia.com/nvidia-management-library-nvml.
[33]
NVIDIA Corporation. Accessed: 2023-01. NVIDIA Nsight Systems. https://developer.nvidia.com/nsight-systems.
[34]
NVIDIA Corporation. Accessed: 2023-01. NVIDIA profiling tools (nvprof). https://docs.nvidia.com/cuda/profiler-users-guide/index.html.
[35]
Sangyoung Park, Jaehyun Park, Donghwa Shin, Yanzhi Wang, Qing Xie, Massoud Pedram, and Naehyuck Chang. 2013. Accurate Modeling of the Delay and Energy Overhead of Dynamic Voltage and Frequency Scaling in Modern Microprocessors. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 32, 5 (2013), 695--708.
[36]
Tapasya Patki, Zachary Frye, Harsh Bhatia, Francesco Di Natale, James Glosli, Helgi Ingolfsson, and Barry Rountree. 2019. Comparing GPU Power and Frequency Capping: A Case Study with the MuMMI Workflow. In 2019 IEEE/ACM Workflows in Support of Large-Scale Science (WORKS). 31--39.
[37]
Indrani Paul, Wei Huang, Manish Arora, and Sudhakar Yalamanchili. 2015. Harmonia: Balancing Compute and Memory Power in High-Performance GPUs. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (Portland, Oregon) (ISCA '15). Association for Computing Machinery, New York, NY, USA, 54--65.
[38]
Indrani Paul, Vignesh Ravi, Srilatha Manne, Manish Arora, and Sudhakar Yalamanchili. 2013. Coordinated energy management in heterogeneous processors. In SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. 1--12.
[39]
PyTorch Developers. Accessed: 2023-01. PyTorch Examples. https://github.com/pytorch/examples.
[40]
Antonin Raffin. Accessed: 2023-01. RL Baselines3 Zoo: A Training Framework for Stable Baselines3 Reinforcement Learning Agents. https://github.com/DLR-RM/rl-baselines3-zoo.
[41]
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. Accessed: 2023-01. Latent Diffusion Models. https://github.com/CompVis/latent-diffusion.
[42]
Muhammad Husni Santriaji and Henry Hoffmann. 2016. GRAPE: Minimizing energy for GPU applications with performance requirements. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 1--13.
[43]
Ankit Sethia and Scott Mahlke. 2014. Equalizer: Dynamic Tuning of GPU Resources for Efficient Execution. In 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture. 647--658.
[44]
Shuaiwen Song, Chunyi Su, Barry Rountree, and Kirk W. Cameron. 2013. A Simplified and Accurate Model of Power-Performance Efficiency on Emergent GPU Architectures. In 2013 IEEE 27th International Symposium on Parallel and Distributed Processing. 673--686.
[45]
Zhenheng Tang, Yuxin Wang, Qiang Wang, and Xiaowen Chu. 2019. The Impact of GPU DVFS on the Energy and Performance of Deep Learning: An Empirical Study. In Proceedings of the Tenth ACM International Conference on Future Energy Systems (Phoenix, AZ, USA) (e-Energy '19). Association for Computing Machinery, New York, NY, USA, 315--325.
[46]
TOP500. Accessed: 2023-05. TOP500 List (June 2023). https://www.top500.org/lists/top500/2023/06/.
[47]
Farui Wang, Weizhe Zhang, Shichao Lai, Meng Hao, and Zheng Wang. 2022. Dynamic GPU Energy Optimization for Machine Learning Training Workloads. IEEE Transactions on Parallel and Distributed Systems 33, 11 (2022), 2943--2954.
[48]
Qiang Wang and Xiaowen Chu. 2020. GPGPU Performance Estimation With Core and Memory Frequency Scaling. IEEE Transactions on Parallel and Distributed Systems 31, 12 (2020), 2865--2881.
[49]
Qiang Wang, Chengjian Liu, and Xiaowen Chu. 2020. GPGPU Performance Estimation for Frequency Scaling Using Cross-Benchmarking. In Proceedings of the 13th Annual Workshop on General Purpose Processing Using Graphics Processing Unit (San Diego, California) (GPGPU '20). Association for Computing Machinery, New York, NY, USA, 31--40.
[50]
Gene Wu, Joseph L. Greathouse, Alexander Lyashevsky, Nuwan Jayasena, and Derek Chiou. 2015. GPGPU performance and power estimation using machine learning. In 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA). 564--576.
[51]
Junyeol Yu, Jongseok Kim, and Euiseong Seo. 2023. Know Your Enemy To Save Cloud Energy: Energy-Performance Characterization of Machine Learning Serving. In 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA). 842--854.
[52]
Pengfei Zou, Ang Li, Kevin Barker, and Rong Ge. 2020. Indicator-Directed Dynamic Power Management for Iterative Workloads on GPU-Accelerated Systems. In 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID). 559--568.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
EuroSys '24: Proceedings of the Nineteenth European Conference on Computer Systems
April 2024
1245 pages
ISBN:9798400704376
DOI:10.1145/3627703
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 April 2024

Check for updates

Badges

Author Tags

  1. DVFS
  2. Data Center
  3. Energy Efficiency
  4. GPU
  5. HPC System
  6. Performance Assurance

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

EuroSys '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 241 of 1,308 submissions, 18%

Upcoming Conference

EuroSys '25
Twentieth European Conference on Computer Systems
March 30 - April 3, 2025
Rotterdam , Netherlands

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 2,680
    Total Downloads
  • Downloads (Last 12 months)2,680
  • Downloads (Last 6 weeks)722
Reflects downloads up to 10 Dec 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media