[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3307681.3326607acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article

PERQ: Fair and Efficient Power Management of Power-Constrained Large-Scale Computing Systems

Published: 17 June 2019 Publication History

Abstract

Large-scale computing systems are becoming increasingly more power-constrained, but these systems employ hardware over- provisioning to achieve higher system throughput because applications often do not consume the peak power capacity of nodes. Unfortunately, focusing on system throughput alone can lead to severe unfairness among multiple concurrently-running applications. This paper introduces PERQ, a new feedback-based principled approach to improve system throughput while achieving fairness among concurrent applications.

References

[1]
George Amvrosiadis, Jun Woo Park, et al. 2018. On the Diversity of Cluster Workloads and its Impact on Research Results. In 2018 USENIX Annual Technical Conference (USENIX ATC 18). 533--546.
[2]
M Andersen, Joachim Dahl, et al. 2013. CVXOPT: A Python Package for Convex Optimization. abel. ee. ucla. edu/cvxopt (2013).
[3]
Reza Azimi, Masoud Badiei, Xin Zhan, Na Li, and Sherief Reda. 2017. Fast Decentralized Power Capping for Server Clusters. In HPCA. 181--192.
[4]
Prasanna Balaprakash, Ananta Tiwari, et al. 2013. Multi Objective Optimization of HPC Kernels for Performance, Power, and Energy. In International PMBS Workshop. Springer, 239--260.
[5]
Alberto Bemporad, Francesco Borrelli, et al. 2000. Optimal Controllers for Hybrid Systems: Stability and Piecewise Linear Explicit Form. In Decision and Control, 2000. Proceedings of the 39th IEEE Conference on, Vol. 2. IEEE, 1810--1815.
[6]
Sridutt Bhalachandra, Allan Porterfield, et al. 2017. Improving Energy Efficiency in Memory-constrained Applications Using Core-specific Power Control. In Proceedings of the Workshop on Energy Efficient Supercomputing. ACM, 6.
[7]
Abhishek Chandra, Pawan Goyal, and Prashant Shenoy. 2003. Quantifying the benefits of resource multiplexing in on-demand data centers. Computer Science Department Faculty Publication Series (2003), 20.
[8]
Hong Chen et al. 1998. A Quasi-Infinite Horizon Nonlinear Model Predictive Control Scheme with Guaranteed Stability. Automatica, Vol. 34, 10 (1998), 1205--1217.
[9]
Jee Choi, Marat Dukhan, et al. 2014. Algorithmic Time, Energy, and Power on Candidate HPC Compute Building Blocks. In Parallel and Distributed Processing Symposium, 2014 IEEE 28th International. IEEE, 447--457.
[10]
Ryan Cochran, Can Hankendi, et al. 2011. Pack & Cap: Adaptive DVFS and Thread Packing under Power Caps. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 175--185.
[11]
Howard David, Eugene Gorbatov, et al. 2010. RAPL: Memory Power Estimation and Capping. In Proceedings of the 16th ACM/IEEE International Symposium on Low Power Electronics and Design. ACM, 189--194.
[12]
Gökalp Demirci, Ivana Marincic, and Henry Hoffmann. 2018. A divide and conquer algorithm for DAG scheduling under power constraints. In SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 466--477.
[13]
Douglas W Doerfler. 2014. Trinity: Next-Generation Supercomputer for the ASC Program. Technical Report. SNL, Albuquerque, NM (United States).
[14]
Xiaobo Fan, Wolf-Dietrich Weber, and Luiz Andre Barroso. 2007. Power provisioning for a warehouse-sized computer. In ACM SIGARCH computer architecture news, Vol. 35. ACM, 13--23.
[15]
Dror G Feitelson et al. 2014. Experience with using the parallel workloads archive. J. Parallel and Distrib. Comput., Vol. 74, 10 (2014), 2967--2982.
[16]
Antonio Filieri, Henry Hoffmann, et al. 2015. Automated Multi-Objective Control for Self-Adaptive Software Design. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. ACM, 13--24.
[17]
Rong Ge, Xizhou Feng, et al. 2005. Performance-Constrained Distributed DVS Scheduling for Scientific Applications on Power-Aware Clusters. In Supercomputing, 2005. Proceedings of the ACM/IEEE SC 2005 Conference. IEEE, 34--34.
[18]
Neha Gholkar, Frank Mueller, et al. 2016. Power Tuning HPC Jobs on Power-Constrained Systems. In Proceedings of the 2016 International Conference on Parallel Architectures and Compilation. ACM, 179--191.
[19]
Neha Gholkar, Frank Mueller, et al. 2018. PShifter: Feedback-Based Dynamic Power Shifting within HPC Jobs for Performance. In 2018 HPDC.
[20]
Henry Hoffmann. 2015. JouleGuard: Energy Guarantees for Approximate Applications. In Symposium on Operating Systems Principles. ACM, 198--214.
[21]
Connor Imes et al. 2015. Minimizing Energy under Performance Constraints on Embedded Platforms: Resource Allocation Heuristics for Homogeneous and Single-ISA Heterogeneous Multi-Cores. ACM SIGBED Review, Vol. 11, 4 (2015), 49--54.
[22]
Connor Imes, Lars Bergstrom, et al. 2016. A Portable Interface for Runtime Energy Monitoring. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 968--974.
[23]
Canturk Isci et al. 2006. An Analysis of Efficient Multi-Core Global Power Management Policies: Maximizing Performance for a Given Power Budget. In Symposium on Microarchitecture. IEEE, 347--358.
[24]
Melanie Kambadur and Martha A Kim. 2014. An Experimental Survey of Energy Management Across the Stack. In ACM SIGPLAN Notices, Vol. 49. ACM, 329--344.
[25]
Jaimie Kelley et al. 2016. Adaptive power profiling for many-core HPC architectures. In ICAC 2016. IEEE, 179--188.
[26]
Kalyan Kumaran. 2016. Introduction to Mira. In Code for Q Workshop.
[27]
Kien Le, Ricardo Bianchini, Thu D Nguyen, Ozlem Bilgir, and Margaret Martonosi. 2010. Capping the brown energy consumption of internet services at low cost. In International Conference on Green Computing. IEEE, 3--14.
[28]
Jay H Lee. 2009. A Lecture on Model Predictive Control. Pan American Advanced Studies Institute Program on Process Systems Engineering (2009).
[29]
Bo Li, Hung-Ching Chang, et al. 2014. The Power-Performance Tradeoffs of the Intel Xeon Phi on HPC Applications. In Parallel & Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International. IEEE, 1448--1456.
[30]
Yang Li et al. 2019. A Scalable Priority-Aware Approach to Managing Data Center Server Power. In HPCA 2019. IEEE, 701--714.
[31]
Harold Lim, Aman Kansal, and Jie Liu. 2011. Power budgeting for virtualized data centers. In 2011 USENIX Annual Technical Conference (ATC).
[32]
Yanpei Liu, Guilherme Cox, et al. 2016. FastCap: An Efficient and Fair Algorithm for Power Capping in Many-Core Systems. In Performance Analysis of Systems and Software (ISPASS), 2016 IEEE International Symposium on. IEEE, 57--68.
[33]
L Ljung. 1999. System Identification-Theory for the User 2nd edition PTR Prentice-Hall. Upper Saddle River, NJ (1999).
[34]
Thomas Ludwig and Manuel Dolz. 2014. Total Cost of Ownership in High Performance Computing. In Talk at the University of Hamburg.
[35]
Olli Mammela et al. 2012. Energy-Aware Job Scheduler for High-Performance Computing. Computer Science-Research and Development, Vol. 27, 4 (2012), 265--275.
[36]
Ivana Marincic et al., Venkatram Vishwanath, and Henry Hoffmann. 2017. PoLiMEr: An Energy Monitoring and Power Limiting Interface for HPC Applications. In Proceedings of the 5th E2SC Workshop. ACM, 7.
[37]
Steven Martin. 2017. Total Cost of Ownership and HPC System Procurement. In Talk at the 2017 International Conference in Supercomputing.
[38]
David Q Mayne, James B Rawlings, et al. 2000. Constrained Model Predictive Control: Stability and Optimality. Automatica, Vol. 36, 6 (2000), 789--814.
[39]
David Meisner, Christopher M Sadler, Luiz André Barroso, Wolf-Dietrich Weber, and Thomas F Wenisch. 2011. Power management of online data-intensive services. In ACM SIGARCH Computer Architecture News, Vol. 39. ACM, 319--330.
[40]
Nikita Mishra, Connor Imes, et al. 2018. CALOREE: Learning Control for Predictable Latency and Low Energy. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 184--198.
[41]
Frank Mueller, Barry Rountree, et al. 2016. Power Tuning for HPC Jobs under Manufacturing Variations. Technical Report. North Carolina State University. Dept. of Computer Science.
[42]
Bin Nie et al. 2017. Characterizing temperature, power, and soft-error behaviors in data center systems: Insights, challenges, and opportunities. In MASCOTS 2017.
[43]
Chandrakant D Patel and Amip J Shah. 2005. Cost model for planning, development and operation of a data center.
[44]
Tapasya Patki et al. 2013. Exploring Hardware Overprovisioning in Power-Constrained, High Performance Computing. In Proceedings of the International Conference on Supercomputing. ACM, 173--182.
[45]
Tapasya Patki et al. 2015. Practical Resource Management in Power-Constrained, High Performance Computing. In Proceedings of the Symposium on High-Performance Parallel and Distributed Computing. ACM, 121--132.
[46]
Tapasya Patki et al. 2016. Economic Viability of Hardware Overprovisioning in Power-Constrained High Performance Computing. In Proceedings of the Workshop on Energy Efficient Supercomputing. IEEE Press, 8--15.
[47]
Steven Pelley, David Meisner, Pooya Zandevakili, Thomas F Wenisch, and Jack Underwood. 2010. Power routing: dynamic power provisioning in the data center. In ACM Sigplan Notices, Vol. 45. ACM, 231--242.
[48]
Ian R Petersen and Roberto Tempo. 2014. Robust Control of Uncertain Systems: Classical Results and Recent Developments. Automatica, Vol. 50, 5 (2014), 1315--1335.
[49]
Raghavendra Pradyumna Pothukuchi, Amin Ansari, et al. 2016. Using Multiple Input, Multiple Output Formal Control to Maximize Resource Efficiency in Architectures. In Computer Architecture (ISCA), 2016 ACM/IEEE 43rd Annual International Symposium on. IEEE, 658--670.
[50]
Ramya Raghavendra, Parthasarathy Ranganathan, Vanish Talwar, Zhikui Wang, and Xiaoyun Zhu. 2008. No power struggles: Coordinated multi-level power management for the data center. ACM SIGOPS Operating Systems Review, Vol. 42, 2 (2008), 48--59.
[51]
Barry Rountree, Dong H Ahn, et al. 2012. Beyond DVFS: A First Look at Performance under a Hardware-Enforced Power Bound. In Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW). IEEE, 947--953.
[52]
Ryuichi Sakamoto, Thang Cao, et al. 2017. Production Hardware Overprovisioning: Real-World Performance Optimization using an Extensible Power-Aware Resource Management Framework. In Parallel and Distributed Processing Symposium (IPDPS), 2017 IEEE International. IEEE, 957--966.
[53]
Ryuichi Sakamoto, Tapasya Patki, et al. 2018. Analyzing Resource Trade-offs in Hardware Overprovisioned Supercomputers. In 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 526--535.
[54]
Osman Sarood, Akhil Langer, et al. 2014. Maximizing Throughput of Overprovisioned HPC Data Centers under a Strict Power Budget. In Supercomputing (SC). IEEE Press, 807--818.
[55]
Kun Tang et al. 2016. Power-capping aware checkpointing: On the interplay among power-capping, temperature, reliability, performance, and energy. In DSN 2016. IEEE, 311--322.
[56]
Akshat Verma, Puneet Ahuja, et al. 2008. Power-Aware Dynamic Placement of HPC Applications. In Proceedings of the 22nd Annual International Conference on Supercomputing. ACM, 175--184.
[57]
Xiaorui Wang, Ming Chen, Charles Lefurgy, and Tom W Keller. 2012. Ship: A scalable hierarchical power control architecture for large-scale data centers. IEEE Transactions on Parallel and Distributed Systems, Vol. 23, 1 (2012), 168--176.
[58]
Will Whiteside, Shelby Funk, Aniruddha Marathe, and Barry Rountree. 2017. PANN: Power Allocation via Neural Networks Dynamic Bounded-Power Allocation in High Performance Computing. In Proceedings of the 5th International Workshop on Energy Efficient Supercomputing. ACM, 8.
[59]
Qiang Wu, Qingyuan Deng, Lakshmi Ganesh, Chang-Hong Hsu, Yun Jin, Sanjeev Kumar, Bin Li, Justin Meza, and Yee Jiun Song. 2016. Dynamo: facebook's data center-wide power management system. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). IEEE, 469--480.
[60]
Huazhe Zhang and Henry Hoffmann. 2016. Maximizing performance under a power cap: A comparison of hardware, software, and hybrid techniques. ACM SIGARCH Computer Architecture News, Vol. 44, 2 (2016), 545--559.
[61]
Huazhe Zhang and Henry Hoffmann. 2018. Performance & energy tradeoffs for dependent distributed applications under system-wide power caps. In Proceedings of the 47th International Conference on Parallel Processing. ACM, 67.

Cited By

View all
  • (2024)Toward Sustainable HPC: In-Production Deployment of Incentive-Based Power Efficiency Mechanism on the Fugaku SupercomputerProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00030(1-16)Online publication date: 17-Nov-2024
  • (2023)Challenges and Opportunities in Sustainable Serverless ComputingACM SIGEnergy Energy Informatics Review10.1145/3630614.36306243:3(53-58)Online publication date: 25-Oct-2023
  • (2023)DDPC: Automated Data-Driven Power-Performance Controller Design on-the-fly for Latency-sensitive Web ServicesProceedings of the ACM Web Conference 202310.1145/3543507.3583437(3067-3076)Online publication date: 30-Apr-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
HPDC '19: Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing
June 2019
278 pages
ISBN:9781450366700
DOI:10.1145/3307681
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 June 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. HPC systems
  2. fairness
  3. over-provisioning
  4. performance
  5. power efficiency

Qualifiers

  • Research-article

Conference

HPDC '19
Sponsor:

Acceptance Rates

HPDC '19 Paper Acceptance Rate 22 of 106 submissions, 21%;
Overall Acceptance Rate 166 of 966 submissions, 17%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)47
  • Downloads (Last 6 weeks)2
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Toward Sustainable HPC: In-Production Deployment of Incentive-Based Power Efficiency Mechanism on the Fugaku SupercomputerProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00030(1-16)Online publication date: 17-Nov-2024
  • (2023)Challenges and Opportunities in Sustainable Serverless ComputingACM SIGEnergy Energy Informatics Review10.1145/3630614.36306243:3(53-58)Online publication date: 25-Oct-2023
  • (2023)DDPC: Automated Data-Driven Power-Performance Controller Design on-the-fly for Latency-sensitive Web ServicesProceedings of the ACM Web Conference 202310.1145/3543507.3583437(3067-3076)Online publication date: 30-Apr-2023
  • (2023)Market Mechanism-Based User-in-the-Loop Scalable Power Oversubscription for HPC Systems2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071006(485-498)Online publication date: Feb-2023
  • (2023)Dynamic power budget redistribution under a power cap on multi-application environmentsSustainable Computing: Informatics and Systems10.1016/j.suscom.2023.10086538(100865)Online publication date: Apr-2023
  • (2022)AI-Enabling Workloads on Large-Scale GPU-Accelerated System: Characterization, Opportunities, and Implications2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00093(1224-1237)Online publication date: Apr-2022
  • (2021)Operating Liquid-Cooled Large-Scale Systems: Long-Term Monitoring, Reliability Analysis, and Efficiency Measures2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00078(881-893)Online publication date: Feb-2021
  • (2020)Job characteristics on large-scale systemsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3433701.3433812(1-17)Online publication date: 9-Nov-2020
  • (2020)Fine-grained Powercap Allocation for Power-constrained Systems based on Multi-objective Machine LearningIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2020.3045983(1-1)Online publication date: 2020
  • (2020)Job Characteristics on Large-Scale Systems: Long-Term Analysis, Quantification, and ImplicationsSC20: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41405.2020.00088(1-17)Online publication date: Nov-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media