[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Light-Weight Prediction for Improving Energy Consumption in HPC Platforms

  • Conference paper
  • First Online:
Euro-Par 2024: Parallel Processing (Euro-Par 2024)

Abstract

With the increase of demand for computing resources and the struggle to provide the necessary energy, power-aware resource management is becoming a major issue for the High-performance computing (HPC) community. Including reliable energy management to a supercomputer’s resource and job management system (RJMS) is not an easy task. The energy consumption of jobs is rarely known in advance and the workload of every machine is unique and different from the others.

We argue that the first step toward properly managing energy is to deeply understand the energy consumption of the workload, which involves predicting the workload’s power consumption and exploiting it by using smart power-aware scheduling algorithms. Crucial questions are (i) how sophisticated a prediction method needs to be to provide accurate workload power predictions, and (ii) to what point an accurate workload’s power prediction translates into efficient energy management.

In this work, we propose a method to predict and exploit HPC workloads’ power consumption, with the objective of reducing the supercomputer’s power consumption while maintaining the management (scheduling) performance of the RJMS. Our method exploits workload submission logs with power monitoring data, and relies on a mix of light-weight power prediction methods and a classical EASY Backfillling inspired heuristic.

We base this study on logs of Marconi 100, a 980 servers supercomputer. We show using simulation that a light-weight history-based prediction method can provide accurate enough power prediction to improve the energy management of a large scale supercomputer compared to energy-unaware scheduling algorithms. These improvements have no significant negative impact on performance.

INP—Institute of engineering Univ. Grenoble Alpes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 43.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 54.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    The term homogeneous in this paper means that all computing nodes have the same CPU/accelerators configuration.

References

  1. Antici, F., Yamamoto, K., Domke, J., Kiziltan, Z.: Augmenting ml-based predictive modelling with NLP to forecast a job’s power consumption. In: Proceedings of the SC’23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, pp. 1820–1830 (2023)

    Google Scholar 

  2. Bates, N., et al.: Electrical grid and supercomputing centers: an investigative analysis of emerging opportunities and challenges. Informatik-Spektrum 38(2), 111–127 (2015)

    Article  Google Scholar 

  3. Borghesi, A., Bartolini, A., Lombardi, M., Milano, M., Benini, L.: Predictive modeling for job power consumption in HPC systems. In: Kunkel, J.M., Balaji, P., Dongarra, J. (eds.) ISC High Performance 2016. LNCS, vol. 9697, pp. 181–199. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41321-1_10

    Chapter  Google Scholar 

  4. Borghesi, A., et al.: M100 ExaData: a data collection campaign on the CINECA’s marconi100 tier-0 supercomputer. Sci. Data 10(1), 288 (2023)

    Article  Google Scholar 

  5. Bugbee, B., Phillips, C., Egan, H., Elmore, R., Gruchalla, K., Purkayastha, A.: Prediction and characterization of application power use in a high-performance computing environment. Stat. Anal. Data Mining ASA Data Sci. J. 10(3), 155–165 (2017)

    Article  MathSciNet  Google Scholar 

  6. Casanova, H., Giersch, A., Legrand, A., Quinson, M., Suter, F.: Versatile, scalable, and accurate simulation of distributed applications and platforms. J. Parallel Distrib. Comput. 74(10), 2899–2917 (2014)

    Article  Google Scholar 

  7. Chasapis, D., Moretó, M., Schulz, M., Rountree, B., Valero, M., Casas, M.: Power efficient job scheduling by predicting the impact of processor manufacturing variability. In: Proceedings of the ACM International Conference on Supercomputing, pp. 296–307 (2019)

    Google Scholar 

  8. Da Costa, G., Pierson, J.M., Fontoura-Cupertino, L.: Mastering system and power measures for servers in datacenter. Sustain. Comput. Inform. Syst. 15, 28–38 (2017). https://doi.org/10.1016/j.suscom.2017.05.003

    Article  Google Scholar 

  9. Dutot, P.F., Mercier, M., Poquet, M., Richard, O.: Batsim: a realistic language-independent resources and jobs management systems simulator. In: 20th Workshop on Job Scheduling Strategies for Parallel Processing, Chicago, United States (2016). https://hal.science/hal-01333471

  10. Emeras, J.: Workload Traces Analysis and Replay in Large Scale Distributed Systems. Theses, Université de Grenoble (2013)

    Google Scholar 

  11. Feitelson, D.G., Rudolph, L., Schwiegelshohn, U., Sevcik, K.C., Wong, P.: Theory and practice in parallel job scheduling. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1997. LNCS, vol. 1291, pp. 1–34. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-63574-2_14

    Chapter  Google Scholar 

  12. Feitelson, D.G., Weil, A.M.: Utilization and predictability in scheduling the IBM SP2 with backfilling. In: Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing, pp. 542–546. IEEE (1998)

    Google Scholar 

  13. Gaussier, E., Glesser, D., Reis, V., Trystram, D.: Improving backfilling by using machine learning to predict running times. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. SC 2015. Association for Computing Machinery, New York (2015)

    Google Scholar 

  14. Khan, K.N., Hirki, M., Niemi, T., Nurminen, J.K., Ou, Z.: RAPL in action: experiences in using RAPL for power measurements. ACM Trans. Model. Perform. Eval. Comput. Syst. 3(2) (2018). https://doi.org/10.1145/3177754

  15. Kocot, B., Czarnul, P., Proficz, J.: Energy-aware scheduling for high-performance computing systems: a survey. Energies 16(2), 890 (2023)

    Article  Google Scholar 

  16. Oak Ridge National Laboratory: Frontier’s architecture (2023). https://olcf.ornl.gov/wp-content/uploads/Frontiers-Architecture-Frontier-Training-Series-final.pdf. Accessed 29 Nov 2023

  17. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  Google Scholar 

  18. Poquet, M., Carastan-Santos, D., Da Costa, G., Stolf, P., Trystram, D.: Artifact data of article “light-weight prediction for improving energy consumption in HPC platforms. Euro-Par 2024 (2024). https://doi.org/10.5281/zenodo.11173631

  19. Saillant, T., Weill, J.-C., Mougeot, M.: Predicting job power consumption based on RJMS submission data in HPC systems. In: Sadayappan, P., Chamberlain, B.L., Juckeland, G., Ltaief, H. (eds.) ISC High Performance 2020. LNCS, vol. 12151, pp. 63–82. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-50743-5_4

    Chapter  Google Scholar 

  20. Shoukourian, H., Wilde, T., Auweter, A., Bode, A.: Predicting the energy and power consumption of strong and weak scaling HPC applications. Supercomput. Front. Innovations 1(2), 20–41 (2014)

    Google Scholar 

  21. Storlie, C., Sexton, J., Pakin, S., Lang, M., Reich, B., Rust, W.: Modeling and predicting power consumption of high performance computing jobs (2015)

    Google Scholar 

  22. Wikipedia: 2021 Texas power crisis (2023). https://en.wikipedia.org/wiki/2021_Texas_power_crisis. Accessed 29 Nov 2023

  23. Zrigui, S., de Camargo, R.Y., Legrand, A., Trystram, D.: Improving the performance of batch schedulers using online job runtime classification. J. Parallel Distrib. Comput. 164, 83–95 (2022)

    Article  Google Scholar 

Download references

Acknowledgements and Artifact Availability

Experiments presented in this paper were carried out using the Grid’5000 testbed. This work was supported by the research program on Edge Intelligence of the Multi-disciplinary Institute on Artificial Intelligence MIAI at Grenoble Alpes (ANR-19-P3IA-0003), ENERGUMEN (ANR-18-CE25-0008), the France 2030 NumPEx Exa-SofT (ANR-22-EXNU-0003) and Cloud CareCloud (ANR-23-PECL-0003) projects managed by the French National Research Agency (ANR), REGALE (H2020-JTI-EuroHPC-2019-1 agreement n. 956560), and LIGHTAIDGE (HORIZON-MSCA-2022-PF-01 agreement n. 101107953). A CC-BY public copyright licence has been applied by the authors to the present document and will be applied to all subsequent versions up to the Author Accepted Manuscript arising from this submission, in accordance with the grants’ open access conditions. We thank Salah Zrigui for starting the study on the job energy profiles. We also thank Francesco Antici for curating and sharing the Marconi100 dataset. The experiments described in this article have been made with open science and reproducibility concerns in mind. Code, data and documentation to reproduce our work is available on Zenodo [18].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Georges Da Costa .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Carastan-Santos, D., Da Costa, G., Poquet, M., Stolf, P., Trystram, D. (2024). Light-Weight Prediction for Improving Energy Consumption in HPC Platforms. In: Carretero, J., Shende, S., Garcia-Blas, J., Brandic, I., Olcoz, K., Schreiber, M. (eds) Euro-Par 2024: Parallel Processing. Euro-Par 2024. Lecture Notes in Computer Science, vol 14801. Springer, Cham. https://doi.org/10.1007/978-3-031-69577-3_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-69577-3_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-69576-6

  • Online ISBN: 978-3-031-69577-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics