[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/2388996.2389086acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Legion: expressing locality and independence with logical regions

Published: 10 November 2012 Publication History

Abstract

Modern parallel architectures have both heterogeneous processors and deep, complex memory hierarchies. We present Legion, a programming model and runtime system for achieving high performance on these machines. Legion is organized around logical regions, which express both locality and independence of program data, and tasks, functions that perform computations on regions. We describe a runtime system that dynamically extracts parallelism from Legion programs, using a distributed, parallel scheduling algorithm that identifies both independent tasks and nested parallelism. Legion also enables explicit, programmer controlled movement of data through the memory hierarchy and placement of tasks based on locality information via a novel mapping interface. We evaluate our Legion implementation on three applications: fluid-flow on a regular grid, a three-level AMR code solving a heat diffusion equation, and a circuit simulation.

References

[1]
K. Fatahalian et al., "Sequoia: Programming the Memory Hierarchy," in Supercomputing, November 2006.
[2]
D. Callahan, B. L. Chamberlain, and H. P. Zima, "The Cascade high productivity language," in High-Level Parallel Programming Models and Supportive Environments, 2004, pp. 52--60.
[3]
W. Carlson, J. Draper, D. Culler, K. Yelick, E. Brooks, and K. Warren, "Introduction to UPC and language specification," UC Berkeley Technical Report: CCS-TR-99-157, 1999.
[4]
J. Vetter et al., "Keeneland: Bringing heterogeneous gpu computing to the computational science community," Comp. in Science Eng., 2011.
[5]
R. Blumofe, C. Joerg, B. Kuszmaul, C. Leiserson, K. Randall, and Y. Zhou, "Cilk: An efficient multithreaded runtime system," in Symposium on Principles and Practice of Parallel Programming, 1995.
[6]
"Cuda programming guide 4.1," http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CUDA_C_Programming_Guide.pdf, January 2012.
[7]
K. Yelick et al., "Productivity and performance using partitioned global address space languages," in PASCO, 2007, pp. 24--32.
[8]
C. Bienia, "Benchmarking modern multiprocessors," Ph.D. dissertation, Princeton University, January 2011.
[9]
A. N. M. Lijewski and J. Bell, "Boxlib," https://ccse.lbl.gov/BoxLib/index.html, 2011.
[10]
M. Bauer, J. Clark, E. Schkufza, and A. Aiken, "Programming the memory hierarchy revisited: Supporting irregular parallelism in Sequoia," in PPoPP, 2011, pp. 13--24.
[11]
J. M. Perez, R. M. Badia, and J. Labarta, "Handling task dependencies under strided and aliased references," in ICS, 2010, pp. 263--274.
[12]
B. Chamberlain, D. Callahan, and H. Zima, "Parallel Programmability and the Chapel Language," Int'l Journal of High Performance Computing Applications, vol. 21, no. 3, pp. 291--312, August 2007.
[13]
B. Chamberlain, S. Choi, S. Deitz, D. Iten, and V. Litvinov, "Authoring User-Defined Domain Maps in Chapel," 2011.
[14]
A. Sidelnik et al., "Using the High Productivity Language Chapel to Target GPGPU Architectures," 2011.
[15]
P. Charles et al., "X10: An object-oriented approach to non-uniform cluster computing," in OOPSLA, 2005, pp. 519--538.
[16]
S. Chandra et al., "Type inference for locality analysis of distributed data structures," in PPoPP, 2008, pp. 11--22.
[17]
M. Joyner, Z. Budimlic, and V. Sarkar, "Subregion analysis and bounds check elimination for high level arrays," in Compiler Construction, 2011.
[18]
"X10 2.1 cuda support," x10.codehaus.org/X10+2.1+CUDA, 2011.
[19]
R. Bocchino, V. Adve, D. Dig, S. Adve, S. Heumann, R. Komuravelli, J. Overbey, P. Simmons, H. Sung, and M. Vakilian, "A type and effect system for deterministic parallel Java," in OOPSLA, 2009, pp. 97--116.
[20]
R. Lublinerman, S. Chaudhuri, and P. Cerny, "Parallel programming with object assemblies," in OOPSLA, 2009, pp. 61--80.
[21]
M. Kulkarni, K. Pingali, G. Ramanarayanan, B. Walter, K. Bala, and L. P. Chew, "Optimistic parallelism benefits from data partitioning," in ASPLOS, 2008, pp. 233--243.
[22]
H. Vandierendonch, G. Tzenakis, and D. Nikolopoulos, "A unified scheduler for recursive and task dataflow parallelism," in PACT, 2011.
[23]
G. Tzenakis et al., "BDDT: Block-level dynamic dependence analysis for deterministic task-based parallelism," in PPoPP, 2012, pp. 301--302.
[24]
Y. Eom, S. Yang, J. Jenista, and B. Demsky, "DOJ: Dynamically parallelizing object-oriented programs," in PPoPP, 2012.
[25]
K. Yelick et al., "Titanium: A high-performance Java dialect," in Workshop on Java for High-Performance Network Computing, 1998.
[26]
E. D. Berger, B. G. Zorn, and K. S. McKinley, "Reconsidering custom memory allocation," in OOPSLA, 2002, pp. 1--12.
[27]
D. Gay and A. Aiken, "Language support for regions," in PLDI, 2001, pp. 70--80.
[28]
D. Grossman et al., "Formal type soundness for Cyclones region system," Tech. Rep., 2001.

Cited By

View all
  • (2024)UNR: Unified Notifiable RMA Library for HPCProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00111(1-15)Online publication date: 17-Nov-2024
  • (2023)Automated Mapping of Task-Based Programs onto Distributed and Heterogeneous MachinesProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607079(1-13)Online publication date: 12-Nov-2023
  • (2023)Itoyori: Reconciling Global Address Space and Global Fork-Join Task ParallelismProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607049(1-15)Online publication date: 12-Nov-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
November 2012
1161 pages
ISBN:9781467308045

Sponsors

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 10 November 2012

Check for updates

Qualifiers

  • Research-article

Conference

SC '12
Sponsor:

Acceptance Rates

SC '12 Paper Acceptance Rate 100 of 461 submissions, 22%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)1
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)UNR: Unified Notifiable RMA Library for HPCProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00111(1-15)Online publication date: 17-Nov-2024
  • (2023)Automated Mapping of Task-Based Programs onto Distributed and Heterogeneous MachinesProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607079(1-13)Online publication date: 12-Nov-2023
  • (2023)Itoyori: Reconciling Global Address Space and Global Fork-Join Task ParallelismProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607049(1-15)Online publication date: 12-Nov-2023
  • (2023)Visibility Algorithms for Dynamic Dependence Analysis and Distributed CoherenceProceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3572848.3577515(218-231)Online publication date: 25-Feb-2023
  • (2022)TD-NUCAProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.5555/3571885.3571991(1-15)Online publication date: 13-Nov-2022
  • (2022)Runtime Techniques for Automatic Process VirtualizationWorkshop Proceedings of the 51st International Conference on Parallel Processing10.1145/3547276.3548522(1-10)Online publication date: 29-Aug-2022
  • (2022)Porting uintah to heterogeneous systemsProceedings of the Platform for Advanced Scientific Computing Conference10.1145/3539781.3539794(1-10)Online publication date: 27-Jun-2022
  • (2022)DISTAL: the distributed tensor algebra compilerProceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3519939.3523437(286-300)Online publication date: 9-Jun-2022
  • (2021)Programming and execution models for parallel bounded exhaustive testingProceedings of the ACM on Programming Languages10.1145/34855435:OOPSLA(1-28)Online publication date: 15-Oct-2021
  • (2021)Optimizing Work Stealing Communication with Structured Atomic OperationsProceedings of the 50th International Conference on Parallel Processing10.1145/3472456.3472522(1-10)Online publication date: 9-Aug-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media