More Web Proxy on the site http://driver.im/

research-article

Parallel programming with migratable objects: charm++ in practice

Authors:

Abhishek Gupta,

Harshitha Menon,

Michael Robson,

Lukasz Wesolowski,

Laxmikant KaleAuthors Info & Claims

SC '14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

Pages 647 - 658

https://doi.org/10.1109/SC.2014.58

Published: 16 November 2014 Publication History

Abstract

The advent of petascale computing has introduced new challenges (e.g. heterogeneity, system failure) for programming scalable parallel applications. Increased complexity and dynamism in science and engineering applications of today have further exacerbated the situation. Addressing these challenges requires more emphasis on concepts that were previously of secondary importance, including migratability, adaptivity, and runtime system introspection. In this paper, we leverage our experience with these concepts to demonstrate their applicability and efficacy for real world applications. Using the Charm++ parallel programming framework, we present details on how these concepts can lead to development of applications that scale irrespective of the rough landscape of supercomputing technology. Empirical evaluation presented in this paper spans many mini-applications and real applications executed on modern supercomputers including Blue Gene/Q, Cray XE6, and Stampede.

References

[1]

D. Brown et al., "Scientific Grand Challenges: Crosscutting Technologies for Computing at the Exascale." U.S. DOE PNNL 20168, Report from Workshop on Feb. 2-4, 2010, Washington, DC, Tech. Rep., 2011.

[2]

"Top Ten Exascale Research Challenges." U.S. DOE, Report from DOE ASCAC Subcommittee, Tech. Rep., 2014, http://science.energy.gov/~/media/ascr/ascac/pdf/meetings/20140210/Top10reportFEB14.pdf.

[3]

"Open Community Runtime," http://01.org/projects/open-community-runtime.

[4]

L. Kale, A. Arya, A. Bhatele, A. Gupta, N. Jain, P. Jetley, J. Lifflander, P. Miller, Y. Sun, R. Venkataraman, L. Wesolowski, and G. Zheng, "Charm++ for productivity and performance: A submission to the 2011 HPC class II challenge," Parallel Programming Laboratory, Tech. Rep. 11-49, November 2011.

[5]

O. S. Lawlor and L. V. Kalé, "Supporting dynamic parallel object arrays," Concurrency and Computation: Practice and Experience, vol. 15, pp. 371--393, 2003.

[6]

G. Zheng, "Achieving high performance on extremely large parallel machines: performance prediction and load balancing," Ph.D. dissertation, Department of Computer Science, University of Illinois at Urbana-Champaign, 2005.

Digital Library

[7]

G. Zheng, L. Shi, and L. V. Kalé, "FTC-Charm++: An In-Memory Checkpoint-Based Fault Tolerant Runtime for Charm++ and MPI," in 2004 IEEE Cluster, San Diego, CA, September 2004, pp. 93--103.

Digital Library

[8]

R. Sawyer, "Calculating total power requirements for data centers," White Paper, American Power Conversion, 2004.

[9]

O. Sarood, E. Meneses, and L. Kalé, "A 'cool' way of improving the reliability of hpc machines," in Proceedings of The International Conference for High Performance Computing, Networking, Storage and Analysis, Denver, CO, USA, November 2013.

Digital Library

[10]

H. Menon, B. Acun, S. G. De Gonzalo, O. Sarood, and L. Kalé, "Thermal aware automated load balancing for hpc applications," in Cluster Computing (CLUSTER), 2013 IEEE International Conference on. IEEE, 2013, pp. 1--8.

[11]

O. Sarood and L. V. Kalé, "A 'cool' load balancer for parallel applications," in Proceedings of the 2011 ACM/IEEE conference on Supercomputing, Seattle, WA, November 2011.

Digital Library

[12]

L. Kale, A. Langer, and O. Sarood, "Power-aware and Temperature Restrain Modeling for Maximizing Performance and Reliability," in DoE Workshop on Modeling and Simulation of Exascale Systems and Applications (MODSIM), Seattle, Washington, August 2014.

[13]

D. G. Feitelson, L. Rudolph, U. Schwiegelshohn, K. C. Sevcik, and P. Wong, "Theory and practice in parallel job scheduling," in Proceedings of the Job Scheduling Strategies for Parallel Processing, ser. IPPS '97. London, UK, UK: Springer-Verlag, 1997, pp. 1--34. {Online}. Available: http://dl.acm.org/citation.cfm?id=646378.689517

Digital Library

[14]

M. C. Cera, Y. Georgiou, O. Richard, N. Maillard, and P. O. A. Navaux, "Supporting malleability in parallel architectures with dynamic cpusets mapping and dynamic mpi," in Proceedings of the 11th international conference on Distributed computing and networking, ser. ICDCN'10. Berlin, Heidelberg: Springer-Verlag, 2010, pp. 242--257. {Online}. Available: http://dl.acm.org/citation.cfm?id=2018057.2018090

Digital Library

[15]

L. V. Kalé, S. Kumar, and J. DeSouza, "A malleable-job system for timeshared parallel machines," in 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid 2002), May 2002.

Digital Library

[16]

A. Gupta, B. Acun, O. Sarood, and L. V. Kale, "Towards Realizing the Potential of Malleable Parallel Jobs," ser. HiPC '14, Goa, India, December 2014.

[17]

The CONVERSE programming language manual, Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, 2006.

[18]

"Gasnet: A portable high-performance communication layer for global address-space languages," 2002. {Online}. Available: http://gasnet.cs.berkeley.edu/

[19]

R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou, "Cilk: An Efficient Multithreaded Runtime System," in Proc. 5th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP'95, Santa Barbara, California, Jul. 1995, pp. 207--216, mIT.

Digital Library

[20]

R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou, "Cilk: An efficient multithreaded runtime system," Journal of Parallel and Distributed Computing, vol. 37, no. 1, pp. 55--69, 1996.

Digital Library

[21]

B. Chamberlain, D. Callahan, and H. Zima, "Parallel Programmability and the Chapel Language," Int. J. High Perform. Comput. Appl., vol. 21, pp. 291--312, August 2007. {Online}. Available: http://dl.acm.org/citation.cfm?id=1286120.1286123

Digital Library

[22]

P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar, "X10: an object-oriented approach to non-uniform cluster computing," in OOPSLA. New York, NY, USA: ACM, 2005, pp. 519--538.

Digital Library

[23]

Y. Sun, J. Lifflander, and L. V. Kale, "PICS: A Performance-Analysis-Based Introspective Control System to Steer Parallel Applications," in ACM Proceedings of 4th International Workshop on Runtime and Operating Systems for Supercomputers ROSS 2014, Munich, Germany, June 2014.

Digital Library

[24]

I. Dooley and L. V. Kale, "Control points for adaptive parallel performance tuning," November 2008.

[25]

I. Dooley, "Intelligent runtime tuning of parallel applications with control points," Ph.D. dissertation, Dept. of Computer Science, University of Illinois, 2010, http://charm.cs.uiuc.edu/papers/DooleyPhDThesis10.shtml.

Digital Library

[26]

L. Wesolowski, R. Venkataraman, A. Gupta, J.-S. Yeom, K. Bisset, Y. Sun, P. Jetley, T. R. Quinn, and L. V. Kale, "Tram: Optimizing fine-grained communication with topological routing and aggregation of messages," Minneapolis, MN, September 2014.

Digital Library

[27]

O. S. Lawlor and L. V. Kalé, "A voxel-based parallel collision detection algorithm," in Proceedings of the International Conference in Supercomputing. ACM Press, June 2002, pp. 285--293.

Digital Library

[28]

F. Miniati and P. Colella, "Block structured adaptive mesh and time refinement for hybrid, hyperbolic+n-body systems," J. Comput. Phys., vol. 227, no. 1, pp. 400--430, Nov. 2007.

Digital Library

[29]

E. Solomonik and L. V. Kale, "Highly Scalable Parallel Sorting," in Proceedings of the 24th IEEE International Parallel and Distributed Processing Symposium (IPDPS), April 2010.

[30]

M. J. Berger and J. Oliger, "Adaptive Mesh Refinement for Hyperbolic Partial Differential Equations," Journal of computational Physics, vol. 53, no. 3, pp. 484--512, 1984.

[31]

L. Kale, A. Arya, N. Jain, A. Langer, J. Lifflander, H. Menon, X. Ni, Y. Sun, E. Totoni, R. Venkataraman, and L. Wesolowski, "Migratable Objects + Active Messages + Adaptive Runtime = Productivity + Performance A Submission to 2012 HPC Class II Challenge," Parallel Programming Laboratory, Tech. Rep. 12-47, November 2012.

[32]

A. Langer, "An Optimal Distributed Load Balancing Algorithm for Homogeneous Work Units," in Proceedings of the 28th ACM International Conference on Supercomputing, ser. ICS '14. New York, NY, USA: ACM, 2014, pp. 165--165. {Online}. Available: http://doi.acm.org/10.1145/2597652.2600108

Digital Library

[33]

B. Oshea, G. Bryan, J. Bordner, M. Norman, T. Abel, R. Harkness, and A. Kritsuk, "Introducing enzo, an amr cosmology application," in Adaptive Mesh Refinement - Theory and Applications, ser. Lecture Notes in Computational Science and Engineering. Springer Berlin Heidelberg, 2005, vol. 41, pp. 341--349.

[34]

"Chombo Software Package for AMR Applications," http://seesar.lbl.gov/anag/chombo.

[35]

G. Weirs, V. Dwarkadas, T. Plewa, C. Tomkins, and M. Marr-Lyon, "Validating the Flash code: vortex-dominated flows," in Astrophysics and Space Science. Springer, 2005, vol. 298, pp. 341--346.

[36]

A. Langer, J. Lifflander, P. Miller, K.-C. Pan, L. V. Kale, and P. Ricker, "Scalable Algorithms for Distributed-Memory Adaptive Mesh Refinement," in Computer Architecture and High Performance Computing (SBAC-PAD), 2012 IEEE 24th International Symposium on. IEEE, 2012, pp. 100--107.

Digital Library

[37]

J. Barnes and P. Hut, "A hierarchical O (N log N) force-calculation algorithm," Nature, vol. 324, pp. 446--449, December 1986.

[38]

I. Karlin, A. Bhatele, J. Keasler, B. L. Chamberlain, J. Cohen, Z. DeVito, R. Haque, D. Laney, E. Luke, F. Wang, D. Richards, M. Schulz, and C. Still, "Exploring traditional and emerging parallel programming models using a proxy application," in 27th IEEE International Parallel & Distributed Processing Symposium (IEEE IPDPS 2013), Boston, USA, May 2013.

Digital Library

[39]

I. Karlin, J. Keasler, and R. Neely, "Lulesh 2.0 updates and changes," Tech. Rep. LLNL-TR-641973, August 2013.

[40]

P. M. Dickens, D. M. Nicol, P. F. Reynolds, Jr., and J. M. Duva, "Analysis of bounded time warp and comparison with yawns," ACM Transactions on Modeling and Computer Simulation, vol. 6, no. 4, pp. 297--320, Oct. 1996.

Digital Library

[41]

"Amazon Elastic Compute Cloud (Amazon EC2)," http://aws.amazon. com/ec2.

[42]

"Magellan Final Report," U.S. Department of Energy (DOE), Tech. Rep., 2011.

[43]

A. Gupta and D. Milojicic, "Evaluation of hpc applications on cloud," in Open Cirrus Summit (OCS), 2011 Sixth, 2011, pp. 22--26.

Digital Library

[44]

A. Gupta, L. V. Kalé, D. S. Milojicic, P. Faraboschi, R. Kaufmann, V. March, F. Gioachin, C. H. Suen, and B.-S. Lee, "The who, what, why and how of high performance computing applications in the cloud," in Proceedings of the 5th IEEE International Conference on Cloud Computing Technology and Science, ser. CloudCom '13, 2013.

Digital Library

[45]

A. Gupta, O. Sarood, L. Kale, and D. Milojicic, "Improving hpc application performance in cloud through dynamic load balancing," in Cluster, Cloud and Grid Computing (CCGrid), 2013 13th IEEE/ACM International Symposium on, 2013, pp. 402--409.

Digital Library

[46]

"KVM -- Kernel-based Virtual Machine," Redhat, Tech. Rep., 2009.

[47]

H. Menon, N. Jain, G. Zheng, and L. V. Kalé, "Automated load balancing invocation based on application characteristics," in IEEE Cluster 12, Beijing, China, September 2012.

Digital Library

[48]

F. Cappello, E. Caron, M. Dayde, F. Desprez, Y. Jegou, P. Primet, E. Jeannot, S. Lanteri, J. Leduc, N. Melab, G. Mornet, R. Namyst, B. Quetier, and O. Richard, "Grid'5000: A large scale and highly reconfigurable grid experimental testbed," in Proceedings of the 6th IEEE/ACM International Workshop on Grid Computing, ser. GRID '05. Washington, DC, USA: IEEE Computer Society, 2005, pp. 99--106. {Online}. Available: http://dx.doi.org/10.1109/GRID.2005.1542730

Digital Library

[49]

D. Schauer et al., "Linux containers version 0.7.0," June 2010, http://lxc.sourceforge.net/.

[50]

L. Sarzyniec, T. Buchert, E. Jeanvoine, and L. Nussbaum, "Design and evaluation of a virtual experimental environment for distributed systems," in Parallel, Distributed and Network-Based Processing (PDP), 2013 21st Euromicro International Conference on, Feb 2013, pp. 172--179.

Digital Library

Cited By

Ma XYan FYang LFoster IPapka MLiu ZKettimuthu RBalsamo SKnottenbelt WAbad CShang W(2024)MalleTrain: Deep Neural Networks Training on Unfillable Supercomputer NodesProceedings of the 15th ACM/SPEC International Conference on Performance Engineering10.1145/3629526.3645035(190-200)Online publication date: 7-May-2024
https://dl.acm.org/doi/10.1145/3629526.3645035
Bakosi J(2024)Partition deactivation with load balancing for parallel flow simulationsJournal of Computational Physics10.1016/j.jcp.2024.113387519:COnline publication date: 15-Dec-2024
https://dl.acm.org/doi/10.1016/j.jcp.2024.113387
Finnerty PPosner JBürger JTakaoka LKanzaki T(2024)On the Performance of Malleable APGAS Programs and Batch Job SchedulersSN Computer Science10.1007/s42979-024-02641-75:4Online publication date: 27-Mar-2024
https://dl.acm.org/doi/10.1007/s42979-024-02641-7
Show More Cited By

Index Terms

Parallel programming with migratable objects: charm++ in practice

Recommendations

Data-Parallel Programming on MIMD Computers

The implementation of two compilers for the data-parallel programming language Dataparallel C is described. One compiler generates code for Intel and nCUBE hypercube multicomputers; the other generates code for Sequent multiprocessors. A suite of ...
The spicec parallel programming system
Parallel Programming: for Multicore and Cluster Systems

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SC '14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

November 2014

1054 pages

ISBN:9781479955008

General Chair:
Trish Damkroger
Lawrence Livermore National Laboratory, Livermore, California
,
Program Chair:
Jack Dongarra
University of Tennessee, Knoxville, Tennessee

Sponsors

Publisher

IEEE Press

Publication History

Published: 16 November 2014

Check for updates

Qualifiers

Research-article

Conference

SC '14

Sponsor:

SIGHPC
SIGARCH
IEEE-CS

SC '14: International Conference for High Performance Computing, Networking, Storage and Analysis

November 16 - 21, 2014

Louisana, New Orleans

Acceptance Rates

SC '14 Paper Acceptance Rate 83 of 394 submissions, 21%;

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

48
Total Citations
View Citations
244
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 18 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ma XYan FYang LFoster IPapka MLiu ZKettimuthu RBalsamo SKnottenbelt WAbad CShang W(2024)MalleTrain: Deep Neural Networks Training on Unfillable Supercomputer NodesProceedings of the 15th ACM/SPEC International Conference on Performance Engineering10.1145/3629526.3645035(190-200)Online publication date: 7-May-2024
https://dl.acm.org/doi/10.1145/3629526.3645035
Bakosi J(2024)Partition deactivation with load balancing for parallel flow simulationsJournal of Computational Physics10.1016/j.jcp.2024.113387519:COnline publication date: 15-Dec-2024
https://dl.acm.org/doi/10.1016/j.jcp.2024.113387
Finnerty PPosner JBürger JTakaoka LKanzaki T(2024)On the Performance of Malleable APGAS Programs and Batch Job SchedulersSN Computer Science10.1007/s42979-024-02641-75:4Online publication date: 27-Mar-2024
https://dl.acm.org/doi/10.1007/s42979-024-02641-7
Posner JGoebel RFinnerty P(2024)Evolving APGAS Programs: Automatic and Transparent Resource Adjustments at RuntimeAsynchronous Many-Task Systems and Applications10.1007/978-3-031-61763-8_15(154-165)Online publication date: 14-Feb-2024
https://dl.acm.org/doi/10.1007/978-3-031-61763-8_15
Faj JKenter TFaghih-Naini SPlessl CAizinger VHuebl ASilvano CRobinson T(2023)Scalable Multi-FPGA Design of a Discontinuous Galerkin Shallow-Water Model on Unstructured MeshesProceedings of the Platform for Advanced Scientific Computing Conference10.1145/3592979.3593407(1-12)Online publication date: 26-Jun-2023
https://dl.acm.org/doi/10.1145/3592979.3593407
Kawanishi YFinnerty PKamada TOhta CChen QHuang ZSi M(2023)Distributed Cell Set : A Library for Space-Dependent Communication/Computation Overlap on Manycore ClusterProceedings of the 14th International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/3582514.3582520(11-19)Online publication date: 25-Feb-2023
https://dl.acm.org/doi/10.1145/3582514.3582520
Shan BAraya-Polo MMalik AChapman BChen QHuang ZSi M(2023)MPI-based Remote OpenMP Offloading: A More Efficient and Easy-to-use ImplementationProceedings of the 14th International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/3582514.3582519(50-59)Online publication date: 25-Feb-2023
https://dl.acm.org/doi/10.1145/3582514.3582519
Finnerty PKamada TOhta C(2022)Integrating a global load balancer to an APGAS distributed collections libraryProceedings of the Thirteenth International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/3528425.3529102(55-64)Online publication date: 2-Apr-2022
https://dl.acm.org/doi/10.1145/3528425.3529102
Choi JFink ZWhite SBhat NRichards DKale L(2022)Accelerating communication for parallel programming models on GPU systemsParallel Computing10.1016/j.parco.2022.102969113:COnline publication date: 1-Oct-2022
https://dl.acm.org/doi/10.1016/j.parco.2022.102969
Posner JFohry C(2021)Transparent Resource Elasticity for Task-Based Cluster Environments with Work Stealing50th International Conference on Parallel Processing Workshop10.1145/3458744.3473361(1-10)Online publication date: 9-Aug-2021
https://dl.acm.org/doi/10.1145/3458744.3473361
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents