[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1504176.1504189acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article

Mapping parallelism to multi-cores: a machine learning based approach

Published: 14 February 2009 Publication History

Abstract

The efficient mapping of program parallelism to multi-core processors is highly dependent on the underlying architecture. This paper proposes a portable and automatic compiler-based approach to mapping such parallelism using machine learning. It develops two predictors: a data sensitive and a data insensitive predictor to select the best mapping for parallel programs. They predict the number of threads and the scheduling policy for any given program using a model learnt off-line. By using low-cost profiling runs, they predict the mapping for a new unseen program across multiple input data sets. We evaluate our approach by selecting parallelism mapping configurations for OpenMP programs on two representative but different multi-core platforms (the Intel Xeon and the Cell processors). Performance of our technique is stable across programs and architectures. On average, it delivers above 96% performance of the maximum available on both platforms. It achieve, on average, a 37% (up to 17.5 times) performance improvement over the OpenMP runtime default scheme on the Cell platform. Compared to two recent prediction models, our predictors achieve better performance with a significant lower profiling cost.

References

[1]
D. H. Bailey, E. Barszcz, et al. The NAS parallel benchmarks. The International Journal of Supercomputer Applications, 5(3):63--73, 1991.
[2]
B. Barnes, B. Rountree, et al. A regression-based approach to scalability prediction. In ICS'08, 2008.
[3]
E. B. Bernhard, M. G. Isabelle, et al. A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual workshop on Computational learning theory, 1992.
[4]
C. M. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, Oxford, U. K., 1996.
[5]
F. Blagojevic, X. Feng, et al. Modeling multi-grain parallelism on heterogeneous multicore processors: A case study of the Cell BE. In HiPEAC'08, 2008.
[6]
J. Cavazos, G. Fursin, et al. Rapidly selecting good compiler optimizations using performance counters. In CGO'07, 2007.
[7]
K. D. Cooper, P. J. Schielke, et al. Optimizing for reduced code space using genetic algorithms. In LCTES'99, 1999.
[8]
J. Corbalan, X. Martorell, et al. Performance-driven processor allocation. IEEE Transaction Parallel Distribution System, 16(7):599--611, 2005.
[9]
L. Dagum and R. Menon. OpenMP: An Industry-Standard API for Shared-Memory Programming. IEEE Comput. Sci. Eng., 5(1):46--55, 1998.
[10]
E. C. David, M. K. Richard, et al. LogP: a practical model of parallel computation. Communications of the ACM, 39(11):78--85, 1996.
[11]
M. Gabriel and M. John. Cross-architecture performance predictions for scientific applications using parameterized models. In SIGMETRICS'04, 2004.
[12]
M. R. Guthaus, J. S. Ringenberg, et al. Mibench: A free, commercially representative embedded benchmark suite, 2001.
[13]
H. Hofstee. Future microprocessors and off-chip SOP interconnect. Advanced Packaging, IEEE Transactions on, 27(2):301--303, May 2004.
[14]
S. Ilya, K. Robert, et al. A case study in top-down performance estimation for a large-scale parallel application. In PPoPP'06, 2006.
[15]
E. Ipek, B. R. de Supinski, et al. An approach to performance prediction for parallel applications. In Euro-Par'05, 2005.
[16]
Y.-K. Kwok and I. Ahmad. Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Comput. Surv., 31(4):406--471, 1999.
[17]
C. Lattner and V. Adve. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In CGO'04, 2004.
[18]
C. Lee. UTDSP benchmark suite, http://www.eecg.toronto.edu/~corinna/DSP/infrastructure/UTDSP.html.
[19]
G. V. Leslie. A bridging model for parallel computation. Communications of the ACM, 33(8):103--111, 1990.
[20]
C. Liao and B. Chapman. A compile-time cost model for OpenMP. In IPDPS'07, 2007.
[21]
C. L. Liu and W. L. James. Scheduling algorithms for multiprogramming in a hard-real-time environment. J. ACM, 20(1):46--61, 1973.
[22]
S. Long, G. Fursin, et al. A cost-aware parallel workload allocation approach based on machine learning. In NPC '07, 2007.
[23]
C. K. Luk, Robert Cohn, et al. Pin: building customized program analysis tools with dynamic instrumentation. In PLDI'05, 2005.
[24]
B. S. Macey and A. Y. Zomaya. A performance evaluation of CP list scheduling heuristics for communication intensive task graphs. In IPPS/SPDP'98, 1998.
[25]
Z. Qin, C. Ioana, et al. Pipa: pipelined profiling and analysis on multicore systems. In CGO'08, 2008.
[26]
J. Ramanujam and P. Sadayappan. A methodology for parallelizing programs for multicomputers and complex memory multiprocessors. In SuperComputing'89, 1989.
[27]
T. Xinmin, G. Milind, et al. Compiler and Runtime Support for Running OpenMP Programs on Pentium-and Itanium-Architectures. In IPDPS'03, 2003.
[28]
Z. Yun and V. Michael. Runtime empirical selection of loop schedulers on hyperthreaded SMPs. In IPDPS'05, 2005.

Cited By

View all
  • (2023)Automated Mapping of Task-Based Programs onto Distributed and Heterogeneous MachinesProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607079(1-13)Online publication date: 12-Nov-2023
  • (2022)Data Locality in High Performance Computing, Big Data, and Converged Systems: An Analysis of the Cutting Edge and a Future System ArchitectureElectronics10.3390/electronics1201005312:1(53)Online publication date: 23-Dec-2022
  • (2022)Adaptive Model Selection for Video Super Resolution2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys57074.2022.00172(1088-1094)Online publication date: Dec-2022
  • Show More Cited By

Index Terms

  1. Mapping parallelism to multi-cores: a machine learning based approach

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      PPoPP '09: Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
      February 2009
      322 pages
      ISBN:9781605583976
      DOI:10.1145/1504176
      • cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 44, Issue 4
        PPoPP '09
        April 2009
        294 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/1594835
        Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 14 February 2009

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. artificial neural networks
      2. compiler optimization
      3. machine learning
      4. performance modeling
      5. support vector machine

      Qualifiers

      • Research-article

      Conference

      PPoPP09
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 230 of 1,014 submissions, 23%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)52
      • Downloads (Last 6 weeks)3
      Reflects downloads up to 03 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Automated Mapping of Task-Based Programs onto Distributed and Heterogeneous MachinesProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607079(1-13)Online publication date: 12-Nov-2023
      • (2022)Data Locality in High Performance Computing, Big Data, and Converged Systems: An Analysis of the Cutting Edge and a Future System ArchitectureElectronics10.3390/electronics1201005312:1(53)Online publication date: 23-Dec-2022
      • (2022)Adaptive Model Selection for Video Super Resolution2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys57074.2022.00172(1088-1094)Online publication date: Dec-2022
      • (2022)Prediction of job characteristics for intelligent resource allocation in HPC systems: a survey and future directionsFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-022-0625-816:5Online publication date: 1-Oct-2022
      • (2022)Predicting number of threads using balanced datasets for openMP regionsComputing10.1007/s00607-022-01081-6105:5(999-1017)Online publication date: 30-Apr-2022
      • (2022)Optimizing Sparse Matrix Multiplications for Graph Neural NetworksLanguages and Compilers for Parallel Computing10.1007/978-3-030-99372-6_7(101-117)Online publication date: 24-Mar-2022
      • (2021)Narrowing the Search Space of Applications Mapping on Hierarchical Topologies2021 International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)10.1109/PMBS54543.2021.00018(118-128)Online publication date: Nov-2021
      • (2021)Memory Utilization and Machine Learning Techniques for Compiler OptimizationITM Web of Conferences10.1051/itmconf/2021370102137(01021)Online publication date: 17-Mar-2021
      • (2020)Modeling and optimizing NUMA effects and prefetching with machine learningProceedings of the 34th ACM International Conference on Supercomputing10.1145/3392717.3392765(1-13)Online publication date: 29-Jun-2020
      • (2020)Resource-Aware Data Parallel Array ProcessingInternational Journal of Parallel Programming10.1007/s10766-020-00664-0Online publication date: 9-Jun-2020
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media