[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3018743.3018763acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article
Public Access

Exploiting Vector and Multicore Parallelism for Recursive, Data- and Task-Parallel Programs

Published: 26 January 2017 Publication History

Abstract

Modern hardware contains parallel execution resources that are well-suited for data-parallelism-vector units-and task parallelism-multicores. However, most work on parallel scheduling focuses on one type of hardware or the other. In this work, we present a scheduling framework that allows for a unified treatment of task- and data-parallelism. Our key insight is an abstraction, task blocks, that uniformly handles data-parallel iterations and task-parallel tasks, allowing them to be scheduled on vector units or executed independently as multicores. Our framework allows us to define schedulers that can dynamically select between executing task- blocks on vector units or multicores. We show that these schedulers are asymptotically optimal, and deliver the maximum amount of parallelism available in computation trees. To evaluate our schedulers, we develop program transformations that can convert mixed data- and task-parallel pro- grams into task block-based programs. Using a prototype instantiation of our scheduling framework, we show that, on an 8-core system, we can simultaneously exploit vector and multicore parallelism to achieve 14×-108× speedup over sequential baselines.

References

[1]
N. S. Arora, R. D. Blumofe, and C. G. Plaxton. Thread Scheduling for Multiprogrammed Multiprocessors. In SPAA, pages 119--129, 1998.
[2]
J. Barnes and P. Hut. A hierarchical o(nlogn) forcecalculation algorithm. Nature, 324(4):446--449, December 1986.
[3]
R. D. Blumofe and C. E. Leiserson. Scheduling Multithreaded Computations by Work Stealing. J. ACM, 46(5):720--748, 1999.
[4]
R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: An Efficient Multithreaded Runtime System. Journal of Parallel and Distributed Computing, 37(1):55--69, August 25, 1996.
[5]
Cilk. Cilk 5.4.6. http://supertech.csail.mit.edu/cilk/.
[6]
I. Corp. Intel Cilk Plus Language Extension Specification, 2011.
[7]
Y. Jo and M. Kulkarni. Enhancing Locality for Recursive Traversals of Recursive Structures. In OOPSLA, pages 463--482, 2011.
[8]
Y. Jo, M. Goldfarb, and M. Kulkarni. Automatic Vectorization of Tree Traversals. In PACT, pages 363--374, 2013.
[9]
Y. A. Liu and S. D. Stoller. From Recursion to Iteration: What are the Optimizations? In PEPM, pages 73--82, 2000.
[10]
S. Maleki, Y. Gao, M. J. Garzaran, T. Wong, and D. A. Padua. An Evaluation of Vectorizing Compilers. In PACT, pages 372--382, 2011.
[11]
OpenMP Architecture Review Board. OpenMP Specification and Features. http://openmp.org/wp/, May 2008.
[12]
M. Pharr and W. R. Mark. ispc: A SPMD Compiler for High-performance CPU Programming. In Innovative Parallel Computing (InPar), 2012, pages 1--13. IEEE, 2012.
[13]
J. Reinders. Intel Threading Building Blocks: Outfitting C++ for Multi-Core Processor Parallelism. O'Reilly, 2007.
[14]
B. Ren, G. Agrawal, J. R. Larus, T. Mytkowicz, T. Poutanen, and W. Schulte. SIMD Parallelization of Applications that Traverse Irregular Data Structures. In 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pages 1--10. IEEE, 2013.
[15]
B. Ren, Y. Jo, S. Krishnamoorthy, K. Agrawal, and M. Kulkarni. Efficient Execution of Recursive Programs on Commodity Vector Hardware. In PLDI, pages 509--520, 2015.
[16]
M. Rinard and P. C. Diniz. Commutativity analysis: a new analysis technique for parallelizing compilers. ACM Trans. Program. Lang. Syst., 19(6):942--991, 1997. ISSN 0164-0925.
[17]
R. Rugina and M. C. Rinard. Recursion Unrolling for Divide and Conquer Programs. In LCPC, pages 34--48, 2000.
[18]
X10. The X10 Programming Language. www.research.ibm.com/x10/, Mar. 2006.

Cited By

View all
  • (2023)Catamaran: Low-Overhead Memory Safety Enforcement via Parallel AccelerationProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3597926.3598098(816-828)Online publication date: 12-Jul-2023
  • (2019)A Proposed Architecture for Parallel HPC-based Resource Management System for Big Data ApplicationsAdvances in Science, Technology and Engineering Systems Journal10.25046/aj0401054:1Online publication date: 2019
  • (2018)Partial control-flow linearizationACM SIGPLAN Notices10.1145/3296979.319241353:4(543-556)Online publication date: 11-Jun-2018
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PPoPP '17: Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
January 2017
476 pages
ISBN:9781450344937
DOI:10.1145/3018743
© 2017 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 January 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data parallelism
  2. general scheduler
  3. task parallelism

Qualifiers

  • Research-article

Funding Sources

Conference

PPoPP '17
Sponsor:

Acceptance Rates

PPoPP '17 Paper Acceptance Rate 29 of 132 submissions, 22%;
Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)111
  • Downloads (Last 6 weeks)14
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Catamaran: Low-Overhead Memory Safety Enforcement via Parallel AccelerationProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3597926.3598098(816-828)Online publication date: 12-Jul-2023
  • (2019)A Proposed Architecture for Parallel HPC-based Resource Management System for Big Data ApplicationsAdvances in Science, Technology and Engineering Systems Journal10.25046/aj0401054:1Online publication date: 2019
  • (2018)Partial control-flow linearizationACM SIGPLAN Notices10.1145/3296979.319241353:4(543-556)Online publication date: 11-Jun-2018
  • (2018)Performance Comparison of NVIDIA accelerators with SIMD, Associative, and Multi-core Processors for Air Traffic ManagementWorkshop Proceedings of the 47th International Conference on Parallel Processing10.1145/3229710.3229757(1-10)Online publication date: 13-Aug-2018
  • (2018)Partial control-flow linearizationProceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3192366.3192413(543-556)Online publication date: 11-Jun-2018
  • (2018)Optimizations of Unstructured Aerodynamics Computations for Many-core ArchitecturesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.282653329:10(2317-2332)Online publication date: 1-Oct-2018
  • (2018)Compiling SIMT Programs on Multi- and Many-Core Processors with Wide Vector Units: A Case Study with CUDA2018 IEEE 25th International Conference on High Performance Computing (HiPC)10.1109/HiPC.2018.00022(123-132)Online publication date: Dec-2018
  • (2017)Using intra-core loop-task accelerators to improve the productivity and performance of task-based parallel programsProceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3123939.3136952(759-773)Online publication date: 14-Oct-2017
  • (2017)Fast segmented sort on GPUsProceedings of the International Conference on Supercomputing10.1145/3079079.3079105(1-10)Online publication date: 14-Jun-2017
  • (2021)Parallel SIMD - A Policy Based Solution for Free Speed-Up using C++ Data-Parallel Types2021 IEEE/ACM 6th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2)10.1109/ESPM254806.2021.00008(20-29)Online publication date: Nov-2021
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media