[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2967938.2967961acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article
Public Access

Power Tuning HPC Jobs on Power-Constrained Systems

Published: 11 September 2016 Publication History

Abstract

As we approach the exascale era, power has become a primary bottleneck. The US Department of Energy has set a power constraint of 20MW on each exascale machine. To be able achieve one exaflop under this constraint, it is necessary that we use power intelligently to maximize performance under a power constraint.
Most production-level parallel applications that run on a supercomputer are tightly-coupled parallel applications. A naíve approach of enforcing a power constraint for a parallel job would be to divide the job's power budget uniformly across all the processors. However, previous work has shown that a power capped job suffers from performance variation of otherwise identical processors leading to overall sub-optimal performance. We propose a 2-level hierarchical variation-aware approach of managing power at machine- level. At the macro level, PPartition partitions a machine's power budget across jobs to assign a power budget to each job running on the system such that the machine never exceeds its power budget. At the micro level, PTune makes job-centric decisions by taking the performance variation into account. For every moldable job, PTune determines the optimal number of processors, the selection of processors and the distribution of the job's power budget across them, with the goal of maximizing the job's performance under its power budget.
Experiments show that, at the micro level, PTune achieves a performance improvement of up to 29% compared to a naíve approach. PTune does not lead to any performance degradation, yet frees up almost 40% of the processors for the same performance as that of the naíve approach under a hard power bound. At the macro level, PPartition is able to achieve a throughput improvement of 5-35% compared to uniform power distribution.

References

[1]
U.s. energy information administration. https://www.eia.gov/electricity/annual/html/epa_04_03.html.
[2]
Top 500 list. http://www.top500.org/, June 2002.
[3]
D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, D. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. The NAS Parallel Benchmarks. The International Journal of Supercomputer Applications, 5(3):63--73, Fall 1991.
[4]
P. E. Bailey, A. Marathe, D. K. Lowenthal, B. Rountree, and M. Schulz. Finding the limits of power-constrained application performance. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC '15, pages 79:1--79:12, New York, NY, USA, 2015. ACM.
[5]
K. Bergman, S. Borkar, D. Campbell, W. Carlson, W. Dally, M. Denneau, P. Franzon, W. Harrod, J. Hiller, S. Karp, S. Keckler, D. Klein, R. Lucas, M. Richards, A. Scarpelli, S. Scott, A. Snavely, T. Sterling, R. S. Williams, K. Yelick, K. Bergman, S. Borkar, D. Campbell, W. Carlson, W. Dally, M. Denneau, P. Franzon, W. Harrod, J. Hiller, S. Keckler, D. Klein, P. Kogge, R. S. Williams, and K. Yelick. Exascale Computing Study: Technology Challenges in Achieving Exascale Systems. 2008.
[6]
S. Borkar, T. Karnik, S. Narendra, J. Tschanz, A. Keshavarzi, and V. De. Parameter variations and impact on circuits and microarchitecture. In Design Automation Conference, 2003. Proceedings, pages 338--342, June 2003.
[7]
B. Dally. Power and programmability: The challenges of exascale computing. In Presentation at ASCR Exascale Research PI Meeting, 2011.
[8]
D. A. Ellsworth, A. D. Malony, B. Rountree, and M. Schulz. Pow: System-wide dynamic reallocation of limited power in hpc. In Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, HPDC '15, pages 145--148, New York, NY, USA, 2015. ACM.
[9]
M. Etinski, J. Corbalan, J. Labarta, and M. Valero. Optimizing Job Performance Under a Given Power Constraint in HPC Centers. In Green Computing Conference, pages 257--267, 2010.
[10]
M. Etinski, J. Corbalan, J. Labarta, and M. Valero. Utilization driven power-aware parallel job scheduling. Computer Science - R&D, 25(3--4):207--216, 2010.
[11]
M. E. Femal and V. Freeh. Boosting data center performance through non-uniform power allocation. In International Conference on Autonomic Computing, pages 250--261, 2005.
[12]
M. E. Femal and V. W. Freeh. Safe overprovisioning: using power limits to increase aggregate throughput. In International Conference on Power-Aware Computer Systems, December 2005.
[13]
V. Freeh, F. Pan, N. Kappiah, and D. K. Lowenthal. Using multiple energy gears in mpi programs on a power-scalable cluster. In PPoPP, pages 164--173, June 2005.
[14]
V. Freeh, F. Pan, N. Kappiah, D. K. Lowenthal, and R. Springer. Exploring the energy-time tradeoff in mpi programs on a power-scalable cluster. In IPDPS, May 2005.
[15]
R. Ge, X. Feng, S. Song, H.-C. Chang, and D. Li. Powerpack: Energy profiling and analysis of high-performance systems and applications. May 2010.
[16]
S. Herbert and D. Marculescu. Variation-aware dynamic voltage/frequency scaling. In High Performance Computer Architecture, 2009. HPCA 2009. IEEE 15th International Symposium on, pages 301--312, Feb 2009.
[17]
J. H. L. III, P. Pokorny, and D. Debonis. Powerinsight - a commodity power measurement capability. In IGCC'13, pages 1--6, 2013.
[18]
Y. Inadomi, T. Patki, K. Inoue, M. Aoyagi, B. Rountree, M. Schulz, D. Lowenthal, Y. Wada, K. Fukazawa, M. Ueda, M. Kondo, and I. Miyoshi. Analyzing and mitigating the impact of manufacturing variability in power-constrained supercomputing. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC '15, pages 78:1--78:12, New York, NY, USA, 2015. ACM.
[19]
InsideHPC. Power Consumption is the Exascale Gorilla in the Room. http://insidehpc.com/2010/12/.
[20]
Intel. Intel-64 and IA-32 Architectures Software Developer's Manual, Volumes 3A and 3B: System Programming Guide. 2011.
[21]
N. Kappiah, V. Freeh, and D. K. Lowenthal. Just in time dynamic voltage scaling: Exploiting inter-node slack to save energy in mpi programs. In SC, Nov 2005.
[22]
A. Langer, E. Totoni, U. S. Palekar, and L. V. Kalé. Energy-efficient computing for hpc workloads on heterogeneous manycore chips. In Proceedings of Programming Models and Applications on Multicores and Manycores. ACM, 2015.
[23]
M. Y. Lim, V. Freeh, and D. K. Lowenthal. Adaptive, transparent frequency and voltage scaling of communication phases in mpi programs. In SC, 2006.
[24]
T. Patki, D. K. Lowenthal, B. Rountree, M. Schulz, and B. R. de Supinski. Exploring Hardware Overprovisioning in Power-constrained, High Performance Computing. In International Conference on Supercomputing, pages 173--182, 2013.
[25]
T. Patki, D. K. Lowenthal, A. Sasidharan, M. Maiterth, B. Rountree, M. Schulz, and B. R. de Supinski. Practical Resource Management in Power-Constrained, High Performance Computing. In HPDC, 2015.
[26]
E. Pinheiro, R. Bianchini, E. V. Carrera, and T. Heath. Load balancing and unbalancing for power and performance in cluster-based systems. In Workshop on compilers and operating systems for low power, 2001.
[27]
B. Rountree, D. H. Ahn, B. R. de Supinski, D. K. Lowenthal, and M. Schulz. Beyond DVFS: A First Look at Performance under a Hardware-Enforced Power Bound. In IPDPS Workshops, pages 947--953. IEEE Computer Society, 2012.
[28]
B. Rountree, D. K. Lowenthal, S. Funk, V. Freeh, B. R. de Supinski, and M. Schulz. Bounding energy consumption in large-scale mpi programs. In SC, pages 10--16, Nov 2007.
[29]
B. Rountree, D. K. Lowenthal, M. Schulz, V. Freeh, and T. Bletsch. Adagio: Making dvs practical for complex hpc applications. In ICS, Nov 2009.
[30]
Sandia National Laboratory. Mantevo project home page. https://software.sandia.gov/mantevo, June 2011.
[31]
V. Sarkar, W. Harrod, and A. Snavely. Software Challenges in Extreme Scale Systems. In Journal of Physics, Conference Series 012045, 2009.
[32]
O. Sarood. Optimizing Performance Under Thermal and Power Constraints for HPC Data Centers. PhD thesis, University of Illinois, Urbana-Champaign, December 2013.
[33]
O. Sarood, A. Langer, A. Gupta, and L. V. Kale. Maximizing throughput of overprovisioned hpc data centers under a strict power budget. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '14, New Orleans, LA, 2014. ACM.
[34]
O. Sarood, A. Langer, L. V. Kale, B. Rountree, and B. de Supinski. Optimizing Power Allocation to CPU and Memory Subsystems in Overprovisioned HPC Systems. In Proceedings of IEEE Cluster 2013, Indianapolis, IN, USA, September 2013.
[35]
K. Shoga, B. Rountree, M. Schulz, and J. Shafer. Whitelisting msrs with msr-safe. In 3rd Workshop on Extreme-Scale Programming Tools at SC, Nov. 2014. http://www.vi-hps.org/upload/program/espt-sc14/vi-hps-ESPT14-Shoga.pdf.
[36]
R. Springer, D. K. Lowenthal, B. Rountree, and V. Freeh. Minimizing execution time in mpi programs on an energy-constrained,power-scalable cluster. In PPoPP, May 2006.
[37]
R. Teodorescu and J. Torrellas. Variation-aware application scheduling and power management for chip multiprocessors. In Computer Architecture, 2008. ISCA '08. 35th International Symposium on, pages 363--374, June 2008.
[38]
E. Totoni, A. Langer, J. Torrellas, and L. Kale. Scheduling for hpc systems with process variation heterogeneity. In Technical Report YCS-2009-443, Department of Computer Science, University of York, 2014.
[39]
L. Zhang, L. S. Bai, R. P. Dick, L. Shang, and R. Joseph. Process variation characterization of chip-level multiprocessors. In Design Automation Conference, 2009. DAC '09. 46th ACM/IEEE, pages 694--697, July 2009.

Cited By

View all
  • (2023)Evaluation of Performance and Power Consumption on Supercomputer Fugaku Using SPEC HPC BenchmarksIEICE Transactions on Electronics10.1587/transele.2022LHP0001E106.C:6(303-311)Online publication date: 1-Jun-2023
  • (2023)NPAT - A Power Analysis Tool at NERSCProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624149(712-719)Online publication date: 12-Nov-2023
  • (2023)DPS: Adaptive Power Management for Overprovisioned SystemsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607091(1-14)Online publication date: 12-Nov-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and Compilation
September 2016
474 pages
ISBN:9781450341219
DOI:10.1145/2967938
© 2016 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 September 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. performance variation
  2. power-constrained computing

Qualifiers

  • Research-article

Funding Sources

Conference

PACT '16
Sponsor:
  • IFIP WG 10.3
  • IEEE TCCA
  • SIGARCH
  • IEEE CS TCPP

Acceptance Rates

PACT '16 Paper Acceptance Rate 31 of 119 submissions, 26%;
Overall Acceptance Rate 121 of 471 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)86
  • Downloads (Last 6 weeks)16
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Evaluation of Performance and Power Consumption on Supercomputer Fugaku Using SPEC HPC BenchmarksIEICE Transactions on Electronics10.1587/transele.2022LHP0001E106.C:6(303-311)Online publication date: 1-Jun-2023
  • (2023)NPAT - A Power Analysis Tool at NERSCProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624149(712-719)Online publication date: 12-Nov-2023
  • (2023)DPS: Adaptive Power Management for Overprovisioned SystemsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607091(1-14)Online publication date: 12-Nov-2023
  • (2023)SYnergy: Fine-grained Energy-Efficient Heterogeneous Computing for Scalable Energy SavingProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607055(1-13)Online publication date: 12-Nov-2023
  • (2023)Acceleration of Bi-Objective Optimization of Data-Parallel Applications for Performance and Energy on Heterogeneous Hybrid PlatformsIEEE Access10.1109/ACCESS.2023.325868411(27226-27245)Online publication date: 2023
  • (2023)Dynamic power budget redistribution under a power cap on multi-application environmentsSustainable Computing: Informatics and Systems10.1016/j.suscom.2023.10086538(100865)Online publication date: Apr-2023
  • (2023)AOA: Adaptive Overclocking Algorithm on CPU-GPU Heterogeneous PlatformsAlgorithms and Architectures for Parallel Processing10.1007/978-3-031-22677-9_14(253-272)Online publication date: 11-Jan-2023
  • (2022)Penelope: Peer-to-peer Power ManagementProceedings of the 51st International Conference on Parallel Processing10.1145/3545008.3545047(1-11)Online publication date: 29-Aug-2022
  • (2022)A Novel Algorithm for Bi-objective Performance-Energy Optimization of Applications with Continuous Performance and Linear Energy Profiles on Heterogeneous HPC PlatformsEuro-Par 2021: Parallel Processing Workshops10.1007/978-3-031-06156-1_14(166-178)Online publication date: 9-Jun-2022
  • (2022)Efficient exact algorithms for continuous bi‐objective performance‐energy optimization of applications with linear energy and monotonically increasing performance profiles on heterogeneous high performance computing platformsConcurrency and Computation: Practice and Experience10.1002/cpe.728535:20Online publication date: 2-Sep-2022
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media