More Web Proxy on the site http://driver.im/

research-article

HabaneroUPC++: a Compiler-free PGAS Library

Authors:

Zoran Budimlić,

Vivek SarkarAuthors Info & Claims

PGAS '14: Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models

Article No.: 5, Pages 1 - 10

https://doi.org/10.1145/2676870.2676879

Published: 06 October 2014 Publication History

Abstract

The Partitioned Global Address Space (PGAS) programming models combine shared and distributed memory features, providing the basis for high performance and high productivity parallel programming environments. UPC++ [39] is a very recent PGAS implementation that takes a library-based approach and avoids the complexities associated with compiler transformations. However, this implementation does not support dynamic task parallelism and only relies on other threading models (e.g., OpenMP or pthreads) for exploiting parallelism within a PGAS place.

In this paper, we introduce a compiler-free PGAS library called HabaneroUPC++, which supports a tighter integration of intra-place and inter-place parallelism than standard hybrid programming approaches. The library makes heavy use of C++11 lambda functions in its APIs. C++11 lambdas avoid the need for compiler support while still retaining the syntactic convenience of language-based approaches. The HabaneroUPC++ library implementation is based on a tight integration of the UPC++ library and the Habanero-C++ library, with new extensions to support the integration. The UPC++ library is used to provide PGAS communication and function shipping support using GASNet, and the Habanero-C++ library is used to provide support for intra-place work-stealing integrated with function shipping. We demonstrate the programmability and performance of our implementation using two benchmarks, scaled up to 6K cores. The insights developed in this paper promise to further enhance the usability and popularity of PGAS programming models.

References

[1]

K. Bergman et al. Exascale computing study: Technology challenges in achieving exascale systems. DARPA IPTO, Tech. Rep, 15, 2008.

[2]

B. Carlson, T. El-Ghazawi, B. Numrich, and K. Yelick. Programming in the partitioned global address space model. Tutorial at Supercomputing, 2003.

[3]

V. Cavè. HClib: a library implementation of the Habanero-C language. http://habanero-rice.github.io/hclib/, 2013.

[4]

V. Cavé, J. Zhao, J. Shirako, and V. Sarkar. Habanero-Java: the new adventures of old X10. In PPPJ, pages 51--61, 2011.

Digital Library

[5]

B. Chamberlain, D. Callahan, and H. Zima. Parallel programmability and the Chapel language. Int. J. High Perform. Comput. Appl., 21(3):291--312, Aug. 2007.

Digital Library

[6]

P. Charles et al. X10: An object-oriented approach to non-uniform cluster computing. In OOPSLA, pages 519--538, 2005.

Digital Library

[7]

S. Chatterjee et al. Integrating asynchronous task parallelism with MPI. In IPDPS, pages 712--725, 2013.

Digital Library

[8]

G. Cong, S. Kodali, S. Krishnamoorthy, D. Lea, V. Saraswat, and T. Wen. Solving large, irregular graph problems using adaptive work-stealing. In ICPP, pages 536--545, 2008.

Digital Library

[9]

J. Dinan, D. B. Larkins, P. Sadayappan, S. Krishnamoorthy, and J. Nieplocha. Scalable work stealing. In SC, pages 53:1--53:11, 2009.

Digital Library

[10]

T. El-Ghazawi and L. Smith. UPC: unified parallel C. In SC, 2006.

Digital Library

[11]

B. B. Fraguela, J. Guo, G. Bikshandi, M. J. Garzarán, G. Almási, J. Moreira, and D. Padua. The hierarchically tiled arrays programming approach. In LCR, pages 1--12, 2004.

Digital Library

[12]

M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. In PLDI, pages 212--223, 1998.

Digital Library

[13]

M. Garland, M. Kudlur, and Y. Zheng. Designing a unified programming model for heterogeneous machines. In SC, 2012.

Digital Library

[14]

M. Grossman, A. S. Sbîrlea, Z. Budimlić, and V. Sarkar. CnC-CUDA: Declarative programming for GPUs. In LCPC, pages 230--245, 2011.

Digital Library

[15]

S. Imam and V. Sarkar. Habanero-java library: A Java 8 framework for multicore programming. In PPPJ, pages 75--86, 2014.

[16]

J. Järvi and J. Freeman. C++ lambda expressions and closures. Science of Computer Programming, 75(9):762--772, 2010.

Digital Library

[17]

V. Kumar, D. Frampton, S. M. Blackburn, D. Grove, and O. Tardieu. Work-stealing without the baggage. In OOPSLA, pages 297--314, 2012.

Digital Library

[18]

D. Lea. A Java Fork/Join framework. In JAVA, pages 36--43, 2000.

Digital Library

[19]

D. Leijen, W. Schulte, and S. Burckhardt. The design of a task parallel library. In OOPSLA, pages 227--242, 2009.

Digital Library

[20]

B. Meister, N. Vasilache, D. Wohlford, M. Baskaran, A. Leung, and R. Lethin. R-Stream Compiler. Springer US, 2011.

[21]

S. Min, C. Iancu, and K. Yelick. Hierarchical work stealing on manycore clusters. In PGAS, 2011.

[22]

E. Mohr, D. A. Kranz, and R. H. Halstead, Jr. Lazy task creation: A technique for increasing the granularity of parallel programs. In LFP, pages 185--197, 1990.

Digital Library

[23]

R. W. Numrich and J. Reid. Co-array Fortran for parallel programming. SIGPLAN Fortran Forum, 17(2):1--31, Aug. 1998.

Digital Library

[24]

Compute unified device architecture programming guide. NVIDIA, 2007.

[25]

Open Community Runtime. https://01.org/open-community-runtime/, Intel Open Source Technology Center, 2014.

[26]

I. Patel and J. Gilbert. An empirical study of the performance and productivity of two parallel programming models. In IPDPS, pages 1--7, April 2008.

[27]

J. Reinders. Intel Threading Building Blocks. O'Reilly & Associates, Inc., first edition, 2007.

Digital Library

[28]

Habanero-C Overview. https://wiki.rice.edu/confluence/display/HABANERO/Habanero-C, Rice University, 2013.

[29]

J. Shirako, D. M. Peixotto, V. Sarkar, and W. N. Scherer. Phasers: A unified deadlock-free construct for collective and point-to-point synchronization. In ICS, pages 277--288, 2008.

Digital Library

[30]

J. Shirako, D. M. Peixotto, V. Sarkar, and W. N. Scherer. Phaser accumulators: A new reduction construct for dynamic parallelism. In IPDPS, pages 1--12, 2009.

Digital Library

[31]

O. Tardieu et al. X10 and APGAS at petascale. In PPoPP, pages 53--66, 2014.

Digital Library

[32]

O. Tardieu, H. Wang, and H. Lin. A work-stealing scheduler for X10's task parallelism with suspension. In PPoPP, pages 267--276, 2012.

Digital Library

[33]

S. Tasirlar and V. Sarkar. Data-driven tasks and their implementation. In ICPP, pages 652--661, 2011.

Digital Library

[34]

K. Taura, K. Tabata, and A. Yonezawa. StackThreads/MP: Integrating futures into calling standards. In PPoPP, pages 60--71, 1999.

Digital Library

[35]

Y. Yan, J. Zhao, Y. Guo, and V. Sarkar. Hierarchical place trees: A portable abstraction for task parallelism and data movement. In LCPC, pages 172--187, 2010.

Digital Library

[36]

C. Yang, K. Murthy, and J. Mellor-Crummey. Managing asynchronous operations in Coarray Fortran 2.0. In IPDPS, pages 1321--1332, May 2013.

Digital Library

[37]

K. Yelick et al. Titanium: A high-performance Java dialect. In ACM, pages 10--11, 1998.

[38]

W. Zhang, O. Tardieu, D. Grove, B. Herta, T. Kamada, V. Saraswat, and M. Takeuchi. GLB: Lifeline-based global load balancing library in X10. In PPAA, pages 31--40, 2014.

Digital Library

[39]

Y. Zheng, A. Kamil, M. B. Driscoll, H. Shan, and K. Yelick. UPC++: a PGAS extension for C++. In IPDPS, 2014.

Digital Library

Cited By

Saini HKumar VChakraborty T(2024)Energy efficient permanence‐based community detection algorithmConcurrency and Computation: Practice and Experience10.1002/cpe.829736:28Online publication date: 13-Oct-2024
https://doi.org/10.1002/cpe.8297
Roa Perdomo DCeccato RNeveu RYviquel HLi XMonsalve Diaz JDoerfert J(2023)Memory Transfer Decomposition: Exploring Smart Data Movement Through Architecture-Aware StrategiesProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624609(1958-1967)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3624609
Shiina STaura KMohror KArnold DBadia R(2023)Itoyori: Reconciling Global Address Space and Global Fork-Join Task ParallelismProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607049(1-15)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3581784.3607049
Show More Cited By

Index Terms

HabaneroUPC++: a Compiler-free PGAS Library
1. Computing methodologies
  1. Concurrent computing methodologies
    1. Concurrent programming languages
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
    2. General programming languages
      1. Language features
      2. Language types
        Concurrent programming languages

Recommendations

The UPC++ PGAS library for Exascale Computing
PAW17: Proceedings of the Second Annual PGAS Applications Workshop

We describe UPC++ V1.0, a C++11 library that supports APGAS programming. UPC++ targets distributed data structures where communication is irregular or fine-grained. The key abstractions are global pointers, asynchronous programming via RPC, and futures. ...
Optimized distributed work-stealing
IA^3 '16: Proceedings of the Sixth Workshop on Irregular Applications: Architectures and Algorithms

Work-stealing is a popular approach for dynamic load balancing of task-parallel programs. However, as has been widely studied, the use of classical work-stealing algorithms on massively parallel and distributed supercomputers introduces several ...
Scaling HabaneroUPC++ on Heterogeneous Supercomputers
PGAS '15: Proceedings of the 2015 9th International Conference on Partitioned Global Address Space Programming Models

Accelerators/co-processors have made their way into supercomputing systems. These modern heterogeneous systems feature multiple layers of memory hierarchies, and produce a high degree of thread-level parallelism. To ensure that current and future ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

PGAS '14: Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models

October 2014

199 pages

ISBN:9781450332477

DOI:10.1145/2676870

Conference Chair:
Allen D. Malony,
Program Chair:
Jeff Hammond

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

SIGHPC: ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing
University of Oregon: University of Oregon

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 October 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

PGAS '14

PGAS '14: 8th International Conference on Partitioned Global Address Space Programming Models

October 6 - 10, 2014

OR, Eugene, USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

37
Total Citations
View Citations
152
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)1

Reflects downloads up to 12 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Saini HKumar VChakraborty T(2024)Energy efficient permanence‐based community detection algorithmConcurrency and Computation: Practice and Experience10.1002/cpe.829736:28Online publication date: 13-Oct-2024
https://doi.org/10.1002/cpe.8297
Roa Perdomo DCeccato RNeveu RYviquel HLi XMonsalve Diaz JDoerfert J(2023)Memory Transfer Decomposition: Exploring Smart Data Movement Through Architecture-Aware StrategiesProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624609(1958-1967)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3624609
Shiina STaura KMohror KArnold DBadia R(2023)Itoyori: Reconciling Global Address Space and Global Fork-Join Task ParallelismProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607049(1-15)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3581784.3607049
Thaker PAyers HRaghavan DNiu NLevis PZaharia M(2021)ClamorProceedings of the ACM Symposium on Cloud Computing10.1145/3472883.3486996(654-669)Online publication date: 1-Nov-2021
https://dl.acm.org/doi/10.1145/3472883.3486996
Kumar SGupta AKumar VBhalachandra Sde Supinski BHall MGamblin T(2021)CuttlefishProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3458817.3476163(1-14)Online publication date: 14-Nov-2021
https://dl.acm.org/doi/10.1145/3458817.3476163
Moreton-Fernandez ASierra YGonzalez-Escribano ALlanos D(2021)Operators for Data Redistribution: Applications to the STL Library and RayTracing AlgorithmIEEE Access10.1109/ACCESS.2021.30636289(38557-38570)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3063628
Dionisi TBouhrour SJaeger JCarribault PPérache M(2021)Enhancing Load-Balancing of MPI Applications with WorkshareEuro-Par 2021: Parallel Processing10.1007/978-3-030-85665-6_29(466-481)Online publication date: 25-Aug-2021
https://doi.org/10.1007/978-3-030-85665-6_29
Fraguela BAndrade D(2021)A software cache autotuning strategy for dataflow computing with UPC++ DepSpawnComputational and Mathematical Methods10.1002/cmm4.1148Online publication date: 22-Feb-2021
https://doi.org/10.1002/cmm4.1148
Fürlinger KGracia JKnüpfer AFuchs THünich DJungblut PKowalewski RSchuchart J(2020)DASH: Distributed Data Structures and Parallel Algorithms in a Global Address SpaceSoftware for Exascale Computing - SPPEXA 2016-201910.1007/978-3-030-47956-5_6(103-142)Online publication date: 31-Jul-2020
https://doi.org/10.1007/978-3-030-47956-5_6
Grove DHamouda SHerta BIyengar AKawachiya KMilthorpe JSaraswat VShinnar ATakeuchi MTardieu O(2019)Failure Recovery in Resilient X10ACM Transactions on Programming Languages and Systems10.1145/333237241:3(1-30)Online publication date: 2-Jul-2019
https://dl.acm.org/doi/10.1145/3332372
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents