[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3407947.3407968acmotherconferencesArticle/Chapter ViewAbstractPublication Pageshp3cConference Proceedingsconference-collections
research-article

Runtime prediction of high-performance computing jobs based on ensemble learning

Published: 06 August 2020 Publication History

Abstract

In high-performance computing job scheduling systems, to accurate predict the job runtime can effectively utilize the idle resource fragments generated during cluster computing as well as backfill them for improving scheduling performance. Because the runtime of high-performance computing jobs are affected by many factors thus are complicated non-linear problem. The ensemble machine learning method is used to predict the runtime of jobs in cluster computing. By comparing the prediction results of different models on the job log data sets from three real high-performance computing systems, it is found that the LightGBM algorithm has higher prediction accuracy, faster computation speed, shorter training time of the model and achieve better overall performance.

References

[1]
Reed, D. A., & Dongarra, J. (2015). Exascale computing and big data. Communications of the ACM, 58(7), 56--68.
[2]
Tang, W., Desai, N., Buettner, D., & Lan, Z. (2010, April). Analyzing and adjusting user runtime estimates to improve job scheduling on the Blue Gene/P. In 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS) (pp. 1--11). IEEE.
[3]
Ataie, E., Gianniti, E., Ardagna, D., & Movaghar, A. (2016, September). A combined analytical modeling machine learning approach for performance prediction of MapReduce jobs in cloud environment. In 2016 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC) (pp. 431--439). IEEE.
[4]
Tsafrir, D., Etsion, Y., & Feitelson, D. G. (2005, June). Modeling user runtime estimates. In Workshop on Job Scheduling Strategies for Parallel Processing (pp. 1--35). Springer, Berlin, Heidelberg.
[5]
Tsafrir, D., Etsion, Y., & Feitelson, D. G. (2007). Backfilling using system-generated predictions rather than user runtime estimates. IEEE Transactions on Parallel and Distributed Systems, 18(6), 789--803.
[6]
Gaussier, E., Glesser, D., Reis, V., & Trystram, D. (2015, November). Improving backfilling by using machine learning to predict running times. In SC'15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (pp. 1--10). IEEE.
[7]
Rauschmayr, N. (2015). A History-based Estimation for LHCb job requirements. In Journal of Physics: Conference Series (Vol. 664, No. 6, p. 062050). IOP Publishing.
[8]
Cunha, R. L., Rodrigues, E. R., Tizzei, L. P., & Netto, M. A. (2017). Job placement advisor based on turnaround predictions for HPC hybrid clouds. Future Generation Computer Systems, 67, 35--46.
[9]
Tao, M., Dong, S., & Zhang, L. (2011). A multi-strategy collaborative prediction model for the runtime of online tasks in computing cluster/grid. Cluster Computing, 14(2), 199--210.
[10]
McKenna, R., Herbein, S., Moody, A., Gamblin, T., & Taufer, M. (2016, September). Machine learning predictions of runtime and IO traffic on high-end clusters. In 2016 IEEE International Conference on Cluster Computing (CLUSTER) (pp. 255--258). IEEE.
[11]
King, M. A., Abrahams, A. S., & Ragsdale, C. T. (2014). Ensemble methods for advanced skier days prediction. Expert systems with applications, 41(4), 1176--1188.
[12]
Mohammed, A. A., Yaqub, W., & Aung, Z. (2017, June). Probabilistic forecasting of solar power: An ensemble learning approach. In International Conference on Intelligent Decision Technologies (pp. 449--458). Springer, Cham.
[13]
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., ... & Liu, T. Y. (2017). Lightgbm: A highly efficient gradient boosting decision tree. In Advances in neural information processing systems (pp. 3146--3154).
[14]
Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, 1189--1232.
[15]
Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785--794).
[16]
Cao, Y., & Gui, L. (2018, November). Multi-step wind power forecasting model using LSTM networks, similar time series and LightGBM. In 2018 5th International Conference on Systems and Informatics (ICSAI) (pp. 192--197). IEEE.
[17]
Zou, Z., Yang, H., & Zhu, A. X. (2020). Estimation of Travel Time Based on Ensemble Method With Multi-Modality Perspective Urban Big Data. IEEE Access, 8, 24819--24828.
[18]
Tang, J., Alelyani, S., & Liu, H. (2014). Feature selection for classification: A review. Data classification: Algorithms and applications, 37.
[19]
Bolón-Canedo, V., Sánchez-Marono, N., Alonso-Betanzos, A., Benítez, J. M., & Herrera, F. (2014). A review of microarray datasets and applied feature selection methods. Information Sciences, 282, 111--135.
[20]
Feitelson, D. G., Tsafrir, D., & Krakov, D. (2014). Experience with using the parallel workloads archive. Journal of Parallel and Distributed Computing, 74(10), 2967--2982.
[21]
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O.,... & Vanderplas, J. (2011). Scikitlearn: Machine learning in Python. Journal of machine learning research, 12(Oct), 2825--2830.
[22]
Chiou, J. M., Yang, Y. F., & Chen, Y. T. (2016). Multivariate functional linear regression and prediction. Journal of Multivariate Analysis, 146, 301--312.
[23]
Gunn, S. R. (1998). Support vector machines for classification and regression. ISIS technical report, 14(1), 5--16.
[24]
Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R news, 2(3), 18--22.

Cited By

View all
  • (2024)A Data-Driven Approach to Analyzing Fuel-Switching Behavior and Predictive Modeling of Liquefied Natural Gas and Low Sulfur Fuel Oil Consumption in Dual-Fuel VesselsJournal of Marine Science and Engineering10.3390/jmse1212223512:12(2235)Online publication date: 5-Dec-2024
  • (2024)HPC Jobs Classification and Resource Prediction to Minimize Job FailuresProceedings of the International Conference on Computer Systems and Technologies 202410.1145/3674912.3674914(95-101)Online publication date: 14-Jun-2024
  • (2023)Job runtime prediction of HPC cluster based on PC-TransformerThe Journal of Supercomputing10.1007/s11227-023-05470-279:17(20208-20234)Online publication date: 12-Jun-2023
  • Show More Cited By

Index Terms

  1. Runtime prediction of high-performance computing jobs based on ensemble learning

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    HP3C 2020: Proceedings of the 2020 4th International Conference on High Performance Compilation, Computing and Communications
    June 2020
    191 pages
    ISBN:9781450376914
    DOI:10.1145/3407947
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    In-Cooperation

    • Xi'an Jiaotong-Liverpool University: Xi'an Jiaotong-Liverpool University
    • City University of Hong Kong: City University of Hong Kong
    • Guangdong University of Technology: Guangdong University of Technology

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 06 August 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. cluster computing
    2. ensemble learning
    3. job scheduling
    4. machine learning
    5. runtime prediction of jobs

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • National Numerical Windtunnel project

    Conference

    HP3C 2020

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)43
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 14 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A Data-Driven Approach to Analyzing Fuel-Switching Behavior and Predictive Modeling of Liquefied Natural Gas and Low Sulfur Fuel Oil Consumption in Dual-Fuel VesselsJournal of Marine Science and Engineering10.3390/jmse1212223512:12(2235)Online publication date: 5-Dec-2024
    • (2024)HPC Jobs Classification and Resource Prediction to Minimize Job FailuresProceedings of the International Conference on Computer Systems and Technologies 202410.1145/3674912.3674914(95-101)Online publication date: 14-Jun-2024
    • (2023)Job runtime prediction of HPC cluster based on PC-TransformerThe Journal of Supercomputing10.1007/s11227-023-05470-279:17(20208-20234)Online publication date: 12-Jun-2023
    • (2022)An Ensemble Learning-Based HPC Multi-Resource Demand Prediction Model for Hybrid Clusters2022 3rd International Conference on Computer Science and Management Technology (ICCSMT)10.1109/ICCSMT58129.2022.00094(413-420)Online publication date: Nov-2022
    • (2022)Prediction of job characteristics for intelligent resource allocation in HPC systems: a survey and future directionsFrontiers of Computer Science10.1007/s11704-022-0625-816:5Online publication date: 23-May-2022

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media