[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3673038.3673059acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article
Open access

A Hybrid Machine Learning Method for Cross-Platform Performance Prediction of Parallel Applications

Published: 12 August 2024 Publication History

Abstract

Accurately predicting parallel application performance across diverse architectures is crucial for cost-effective platform selection and optimization. The existing analytic predictive approaches pose challenges in building accurate, scalable, and comprehensive models with limited applicability by missing fine-grain interdependencies between system architecture and application. In this paper, we propose a hybrid machine learning methodology to map performance across heterogeneous computing platforms by their mutual performance ratios. The methodology allows users to predict the relative performance of a parallel application without fully executing it on systems by using a reference platform. We demonstrate that it is sufficient to observe brief partial executions of an application on the reference platform. Then, our trained models can predict the application's performance on several targeted platforms. We present our novel Ensemble Cluster Classify Regress method as a predictive kernel to maximize the models' accuracy, efficiency, scalability, and interpretability. We propose an automatic mechanism to map accordant CPU bursts in parallel applications to label data by computing the ratios. The models are automatically generated from the training dataset, bypassing the challenging and possibly error-prone procedure needed for creating analytic models. Consequently, our novel data-driven approach is handier for developers with limited performance knowledge, outperforming existing methods that require advanced hardware and analytics expertise. Our experiments across various platforms and applications demonstrate a predictive model cross-validation accuracy exceeding 98%, along with the capability to forecast execution times for unseen applications with an accuracy exceeding 94%. Integrating our innovative macrobenchmark kernels lead to a significant improvement in prediction accuracy.

References

[1]
[1] Barcelona Supercomputing Center, Performance Analysis Tools. Available at: https://tools.bsc.es/
[2]
[2] Barchi, F., Parisi, E., Bartolini, A., & Acquaviva, A. (2022). Deep Learning Approaches to Source Code Analysis for Optimization of Heterogeneous Systems: Recent Results, Challenges and Opportunities. Journal of Low Power Electronics and Applications, 12(3), 37.
[3]
[3] Bhatele, A., Mohror, K., Langer, S. H., & Isaacs, K. E. (2013, November). There goes the neighborhood: performance degradation due to nearby jobs. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (pp. 1-12).
[4]
[4] Casas, M., Servat, H., Huck, K., Gimenez, J., & Labarta, J. (2011, December). Trace spectral analysis toward dynamic levels of detail. In 2011 IEEE 17th International Conference on Parallel and Distributed Systems (pp. 332-339).
[5]
[5] Culler, D., Karp, R., Patterson, D., Sahay, A., Schauser, K. E., Santos, E.,... & Von Eicken, T. (1993, July). LogP: Towards a realistic model of parallel computation. In Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming (pp. 1-12).
[6]
[6] Dwyer, T., Fedorova, A., Blagodurov, S., Roth, M., Gaud, F., & Pei, J. (2012, November). A practical method for estimating performance degradation on multicore processors, and its application to hpc workloads. In SC’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (pp. 1-11).
[7]
[7] Ferreira, K. B., Levy, S., Pedretti, K., & Grant, R. E. (2017, September). Characterizing MPI matching via trace-based simulation. In Proceedings of the 24th European MPI Users’ Group Meeting (pp. 1-11).
[8]
[8] Hoisie, A., Lubeck, O., & Wasserman, H. (2007, September). Performance analysis of wavefront algorithms on very-large scale distributed systems. In Workshop on wide area networks and high performance computing (pp. 171-187). London: Springer London.
[9]
[9] Joseph, P. J., Vaswani, K., & Thazhuthaveetil, M. J. (2006, February). Construction and use of linear regression models for processor performance analysis. In The Twelfth International Symposium on High-Performance Computer Architecture, 2006. (pp. 99-108). IEEE.
[10]
[10] Jost, G., Labarta, J., & Gimenez, J. (2004). Paramedir: A tool for programmable performance analysis. In Computational Science-ICCS 2004: 4th International Conference, Kraków, Poland, June 6-9, 2004, Proceedings, Part I 4 (pp. 466-469). Springer Berlin Heidelberg.
[11]
[11] Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W.,... & Liu, T. Y. (2017). Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, 30.
[12]
[12] Lee, B. C., Brooks, D. M., de Supinski, B. R., Schulz, M., Singh, K., & McKee, S. A. (2007, March). Methods of inference and learning for performance modeling of parallel applications. In Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming (pp. 249-258).
[13]
[13] Llort, G., Gonzalez, J., Servat, H., Gimenez, J., & Labarta, J. (2010, April). On-line detection of large-scale parallel application’s structure. In 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS) (pp. 1-10).
[14]
[14] Madireddy, S., Balaprakash, P., Carns, P., Latham, R., Lockwood, G. K., Ross, R.,... & Wild, S. M. (2019, August). Adaptive learning for concept drift in application performance modeling. In Proceedings of the 48th International Conference on Parallel Processing (pp. 1-11).
[15]
[15] Mahdavi, K., Labarta, J., Gimenez, J., Mousavinia, A., & Mousavinia, A. (2022, July). Feature space curvature map: A method to homogenize cluster densities. In 2022 International Joint Conference on Neural Networks (IJCNN) (pp. 1-10). IEEE.
[16]
[16] Mahdavi, K. (2022). Enhanced clustering analysis pipeline for performance analysis of parallel applications. Tesi doctoral, UPC, Departament d’Arquitectura de Computadors. DOI 10.5821/dissertation-2117-375586.
[17]
[17] Mucherino, A., Papajorgji, P. J., Pardalos, P. M., Mucherino, A., Papajorgji, P. J., & Pardalos, P. M. (2009). K-nearest neighbor classification. Data mining in agriculture, 83-106.
[18]
[18] Nudd, G. R., & Kerbyson, D. J. (2000). PACE-a toolset for the performance prediction of parallel and distributed systems. IJHPCA, 14(3), (pp. 228-251).
[19]
[19] LightGBM documentation, https://lightgbm.readthedocs.io
[20]
[20] Rodrigues, A. F., Hemmert, K. S., Barrett, B. W., Kersey, C., Oldfield, R., Weston, M.,... & Jacob, B. (2011). The structural simulation toolkit. ACM SIGMETRICS Performance Evaluation Review, 38(4), 37-42.
[21]
[21] Romano, S., Bailey, J., Nguyen, V., & Verspoor, K. (2014, June). Standardized mutual information for clustering comparisons: one step further in adjustment for chance. In International conference on machine learning (pp. 1143-1151). PMLR.
[22]
[22] Tuncer, O., Ates, E., Zhang, Y., Turk, A., Brandt, J., Leung, V. J.,... & Coskun, A. K. (2017). Diagnosing performance variations in HPC applications using machine learning. In High Performance Computing: 32nd International Conference, ISC High Performance 2017, Frankfurt, Germany, June 18–22, 2017, Proceedings 32 (pp. 355-373).
[23]
[23] Vetter, J. S., & Reed, D. A. (2000). Real-time performance monitoring, adaptive control, and interactive steering of computational grids. The International Journal of High Performance Computing Applications, 14(4), 357-366.
[24]
[24] Williams, S., Oliker, L., Vuduc, R., Shalf, J., Yelick, K., & Demmel, J. (2007, November). Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (pp. 1-12).
[25]
[25] Witt, C., Bux, M., Gusew, W., & Leser, U. (2019). Predictive performance modeling for distributed batch processing using black box monitoring and machine learning. Information Systems, 82, 33-52.
[26]
[26] Wu, J., Chen, X. Y., Zhang, H., Xiong, L. D., Lei, H., & Deng, S. H. (2019). Hyperparameter optimization for machine learning models based on Bayesian optimization. Journal of Electronic Science and Technology, 17(1), 26-40.
[27]
[27] Yokelson, D., Charest, M. R. J., & Li, Y. W. (2023). HPC Application Performance Prediction with Machine Learning on New Architectures. In Proceedings of the 2023 on Performance EngineeRing, Modelling, Analysis, and VisualizatiOn Strategy (pp. 1-8).
[28]
[28] Zheng, G., Kakulapati, G., & BigSim, L. K. (2004, April). A Parallel Simulator for Performance Prediction of Extremely Large Parallel Machines Proc. 18th Int. In Parallel and Distributed Processing Symposium, IPDPS., (pp. 78-87).
[29]
[29] Zhang, Z., Sun, J., Zhang, J., Qin, Y., & Sun, G. (2019, August). Constructing skeleton for parallel applications with machine learning methods. In Workshop Proceedings of the 48th International Conference on Parallel Processing (pp. 1-8).

Index Terms

  1. A Hybrid Machine Learning Method for Cross-Platform Performance Prediction of Parallel Applications

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICPP '24: Proceedings of the 53rd International Conference on Parallel Processing
    August 2024
    1279 pages
    ISBN:9798400717932
    DOI:10.1145/3673038
    This work is licensed under a Creative Commons Attribution-NonCommercial International 4.0 License.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 August 2024

    Check for updates

    Author Tags

    1. AutoML
    2. Classification
    3. Clustering
    4. Gradient Boosting

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICPP '24

    Acceptance Rates

    Overall Acceptance Rate 91 of 313 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 138
      Total Downloads
    • Downloads (Last 12 months)138
    • Downloads (Last 6 weeks)45
    Reflects downloads up to 25 Dec 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media