[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1109/SC.2014.58acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Parallel programming with migratable objects: charm++ in practice

Published: 16 November 2014 Publication History

Abstract

The advent of petascale computing has introduced new challenges (e.g. heterogeneity, system failure) for programming scalable parallel applications. Increased complexity and dynamism in science and engineering applications of today have further exacerbated the situation. Addressing these challenges requires more emphasis on concepts that were previously of secondary importance, including migratability, adaptivity, and runtime system introspection. In this paper, we leverage our experience with these concepts to demonstrate their applicability and efficacy for real world applications. Using the Charm++ parallel programming framework, we present details on how these concepts can lead to development of applications that scale irrespective of the rough landscape of supercomputing technology. Empirical evaluation presented in this paper spans many mini-applications and real applications executed on modern supercomputers including Blue Gene/Q, Cray XE6, and Stampede.

References

[1]
D. Brown et al., "Scientific Grand Challenges: Crosscutting Technologies for Computing at the Exascale." U.S. DOE PNNL 20168, Report from Workshop on Feb. 2-4, 2010, Washington, DC, Tech. Rep., 2011.
[2]
"Top Ten Exascale Research Challenges." U.S. DOE, Report from DOE ASCAC Subcommittee, Tech. Rep., 2014, http://science.energy.gov/~/media/ascr/ascac/pdf/meetings/20140210/Top10reportFEB14.pdf.
[3]
"Open Community Runtime," http://01.org/projects/open-community-runtime.
[4]
L. Kale, A. Arya, A. Bhatele, A. Gupta, N. Jain, P. Jetley, J. Lifflander, P. Miller, Y. Sun, R. Venkataraman, L. Wesolowski, and G. Zheng, "Charm++ for productivity and performance: A submission to the 2011 HPC class II challenge," Parallel Programming Laboratory, Tech. Rep. 11-49, November 2011.
[5]
O. S. Lawlor and L. V. Kalé, "Supporting dynamic parallel object arrays," Concurrency and Computation: Practice and Experience, vol. 15, pp. 371--393, 2003.
[6]
G. Zheng, "Achieving high performance on extremely large parallel machines: performance prediction and load balancing," Ph.D. dissertation, Department of Computer Science, University of Illinois at Urbana-Champaign, 2005.
[7]
G. Zheng, L. Shi, and L. V. Kalé, "FTC-Charm++: An In-Memory Checkpoint-Based Fault Tolerant Runtime for Charm++ and MPI," in 2004 IEEE Cluster, San Diego, CA, September 2004, pp. 93--103.
[8]
R. Sawyer, "Calculating total power requirements for data centers," White Paper, American Power Conversion, 2004.
[9]
O. Sarood, E. Meneses, and L. Kalé, "A 'cool' way of improving the reliability of hpc machines," in Proceedings of The International Conference for High Performance Computing, Networking, Storage and Analysis, Denver, CO, USA, November 2013.
[10]
H. Menon, B. Acun, S. G. De Gonzalo, O. Sarood, and L. Kalé, "Thermal aware automated load balancing for hpc applications," in Cluster Computing (CLUSTER), 2013 IEEE International Conference on. IEEE, 2013, pp. 1--8.
[11]
O. Sarood and L. V. Kalé, "A 'cool' load balancer for parallel applications," in Proceedings of the 2011 ACM/IEEE conference on Supercomputing, Seattle, WA, November 2011.
[12]
L. Kale, A. Langer, and O. Sarood, "Power-aware and Temperature Restrain Modeling for Maximizing Performance and Reliability," in DoE Workshop on Modeling and Simulation of Exascale Systems and Applications (MODSIM), Seattle, Washington, August 2014.
[13]
D. G. Feitelson, L. Rudolph, U. Schwiegelshohn, K. C. Sevcik, and P. Wong, "Theory and practice in parallel job scheduling," in Proceedings of the Job Scheduling Strategies for Parallel Processing, ser. IPPS '97. London, UK, UK: Springer-Verlag, 1997, pp. 1--34. {Online}. Available: http://dl.acm.org/citation.cfm?id=646378.689517
[14]
M. C. Cera, Y. Georgiou, O. Richard, N. Maillard, and P. O. A. Navaux, "Supporting malleability in parallel architectures with dynamic cpusets mapping and dynamic mpi," in Proceedings of the 11th international conference on Distributed computing and networking, ser. ICDCN'10. Berlin, Heidelberg: Springer-Verlag, 2010, pp. 242--257. {Online}. Available: http://dl.acm.org/citation.cfm?id=2018057.2018090
[15]
L. V. Kalé, S. Kumar, and J. DeSouza, "A malleable-job system for timeshared parallel machines," in 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid 2002), May 2002.
[16]
A. Gupta, B. Acun, O. Sarood, and L. V. Kale, "Towards Realizing the Potential of Malleable Parallel Jobs," ser. HiPC '14, Goa, India, December 2014.
[17]
The CONVERSE programming language manual, Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, 2006.
[18]
"Gasnet: A portable high-performance communication layer for global address-space languages," 2002. {Online}. Available: http://gasnet.cs.berkeley.edu/
[19]
R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou, "Cilk: An Efficient Multithreaded Runtime System," in Proc. 5th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP'95, Santa Barbara, California, Jul. 1995, pp. 207--216, mIT.
[20]
R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou, "Cilk: An efficient multithreaded runtime system," Journal of Parallel and Distributed Computing, vol. 37, no. 1, pp. 55--69, 1996.
[21]
B. Chamberlain, D. Callahan, and H. Zima, "Parallel Programmability and the Chapel Language," Int. J. High Perform. Comput. Appl., vol. 21, pp. 291--312, August 2007. {Online}. Available: http://dl.acm.org/citation.cfm?id=1286120.1286123
[22]
P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar, "X10: an object-oriented approach to non-uniform cluster computing," in OOPSLA. New York, NY, USA: ACM, 2005, pp. 519--538.
[23]
Y. Sun, J. Lifflander, and L. V. Kale, "PICS: A Performance-Analysis-Based Introspective Control System to Steer Parallel Applications," in ACM Proceedings of 4th International Workshop on Runtime and Operating Systems for Supercomputers ROSS 2014, Munich, Germany, June 2014.
[24]
I. Dooley and L. V. Kale, "Control points for adaptive parallel performance tuning," November 2008.
[25]
I. Dooley, "Intelligent runtime tuning of parallel applications with control points," Ph.D. dissertation, Dept. of Computer Science, University of Illinois, 2010, http://charm.cs.uiuc.edu/papers/DooleyPhDThesis10.shtml.
[26]
L. Wesolowski, R. Venkataraman, A. Gupta, J.-S. Yeom, K. Bisset, Y. Sun, P. Jetley, T. R. Quinn, and L. V. Kale, "Tram: Optimizing fine-grained communication with topological routing and aggregation of messages," Minneapolis, MN, September 2014.
[27]
O. S. Lawlor and L. V. Kalé, "A voxel-based parallel collision detection algorithm," in Proceedings of the International Conference in Supercomputing. ACM Press, June 2002, pp. 285--293.
[28]
F. Miniati and P. Colella, "Block structured adaptive mesh and time refinement for hybrid, hyperbolic+n-body systems," J. Comput. Phys., vol. 227, no. 1, pp. 400--430, Nov. 2007.
[29]
E. Solomonik and L. V. Kale, "Highly Scalable Parallel Sorting," in Proceedings of the 24th IEEE International Parallel and Distributed Processing Symposium (IPDPS), April 2010.
[30]
M. J. Berger and J. Oliger, "Adaptive Mesh Refinement for Hyperbolic Partial Differential Equations," Journal of computational Physics, vol. 53, no. 3, pp. 484--512, 1984.
[31]
L. Kale, A. Arya, N. Jain, A. Langer, J. Lifflander, H. Menon, X. Ni, Y. Sun, E. Totoni, R. Venkataraman, and L. Wesolowski, "Migratable Objects + Active Messages + Adaptive Runtime = Productivity + Performance A Submission to 2012 HPC Class II Challenge," Parallel Programming Laboratory, Tech. Rep. 12-47, November 2012.
[32]
A. Langer, "An Optimal Distributed Load Balancing Algorithm for Homogeneous Work Units," in Proceedings of the 28th ACM International Conference on Supercomputing, ser. ICS '14. New York, NY, USA: ACM, 2014, pp. 165--165. {Online}. Available: http://doi.acm.org/10.1145/2597652.2600108
[33]
B. Oshea, G. Bryan, J. Bordner, M. Norman, T. Abel, R. Harkness, and A. Kritsuk, "Introducing enzo, an amr cosmology application," in Adaptive Mesh Refinement - Theory and Applications, ser. Lecture Notes in Computational Science and Engineering. Springer Berlin Heidelberg, 2005, vol. 41, pp. 341--349.
[34]
"Chombo Software Package for AMR Applications," http://seesar.lbl.gov/anag/chombo.
[35]
G. Weirs, V. Dwarkadas, T. Plewa, C. Tomkins, and M. Marr-Lyon, "Validating the Flash code: vortex-dominated flows," in Astrophysics and Space Science. Springer, 2005, vol. 298, pp. 341--346.
[36]
A. Langer, J. Lifflander, P. Miller, K.-C. Pan, L. V. Kale, and P. Ricker, "Scalable Algorithms for Distributed-Memory Adaptive Mesh Refinement," in Computer Architecture and High Performance Computing (SBAC-PAD), 2012 IEEE 24th International Symposium on. IEEE, 2012, pp. 100--107.
[37]
J. Barnes and P. Hut, "A hierarchical O (N log N) force-calculation algorithm," Nature, vol. 324, pp. 446--449, December 1986.
[38]
I. Karlin, A. Bhatele, J. Keasler, B. L. Chamberlain, J. Cohen, Z. DeVito, R. Haque, D. Laney, E. Luke, F. Wang, D. Richards, M. Schulz, and C. Still, "Exploring traditional and emerging parallel programming models using a proxy application," in 27th IEEE International Parallel & Distributed Processing Symposium (IEEE IPDPS 2013), Boston, USA, May 2013.
[39]
I. Karlin, J. Keasler, and R. Neely, "Lulesh 2.0 updates and changes," Tech. Rep. LLNL-TR-641973, August 2013.
[40]
P. M. Dickens, D. M. Nicol, P. F. Reynolds, Jr., and J. M. Duva, "Analysis of bounded time warp and comparison with yawns," ACM Transactions on Modeling and Computer Simulation, vol. 6, no. 4, pp. 297--320, Oct. 1996.
[41]
"Amazon Elastic Compute Cloud (Amazon EC2)," http://aws.amazon. com/ec2.
[42]
"Magellan Final Report," U.S. Department of Energy (DOE), Tech. Rep., 2011.
[43]
A. Gupta and D. Milojicic, "Evaluation of hpc applications on cloud," in Open Cirrus Summit (OCS), 2011 Sixth, 2011, pp. 22--26.
[44]
A. Gupta, L. V. Kalé, D. S. Milojicic, P. Faraboschi, R. Kaufmann, V. March, F. Gioachin, C. H. Suen, and B.-S. Lee, "The who, what, why and how of high performance computing applications in the cloud," in Proceedings of the 5th IEEE International Conference on Cloud Computing Technology and Science, ser. CloudCom '13, 2013.
[45]
A. Gupta, O. Sarood, L. Kale, and D. Milojicic, "Improving hpc application performance in cloud through dynamic load balancing," in Cluster, Cloud and Grid Computing (CCGrid), 2013 13th IEEE/ACM International Symposium on, 2013, pp. 402--409.
[46]
"KVM -- Kernel-based Virtual Machine," Redhat, Tech. Rep., 2009.
[47]
H. Menon, N. Jain, G. Zheng, and L. V. Kalé, "Automated load balancing invocation based on application characteristics," in IEEE Cluster 12, Beijing, China, September 2012.
[48]
F. Cappello, E. Caron, M. Dayde, F. Desprez, Y. Jegou, P. Primet, E. Jeannot, S. Lanteri, J. Leduc, N. Melab, G. Mornet, R. Namyst, B. Quetier, and O. Richard, "Grid'5000: A large scale and highly reconfigurable grid experimental testbed," in Proceedings of the 6th IEEE/ACM International Workshop on Grid Computing, ser. GRID '05. Washington, DC, USA: IEEE Computer Society, 2005, pp. 99--106. {Online}. Available: http://dx.doi.org/10.1109/GRID.2005.1542730
[49]
D. Schauer et al., "Linux containers version 0.7.0," June 2010, http://lxc.sourceforge.net/.
[50]
L. Sarzyniec, T. Buchert, E. Jeanvoine, and L. Nussbaum, "Design and evaluation of a virtual experimental environment for distributed systems," in Parallel, Distributed and Network-Based Processing (PDP), 2013 21st Euromicro International Conference on, Feb 2013, pp. 172--179.

Cited By

View all
  • (2024)MalleTrain: Deep Neural Networks Training on Unfillable Supercomputer NodesProceedings of the 15th ACM/SPEC International Conference on Performance Engineering10.1145/3629526.3645035(190-200)Online publication date: 7-May-2024
  • (2024)Partition deactivation with load balancing for parallel flow simulationsJournal of Computational Physics10.1016/j.jcp.2024.113387519:COnline publication date: 15-Dec-2024
  • (2024)On the Performance of Malleable APGAS Programs and Batch Job SchedulersSN Computer Science10.1007/s42979-024-02641-75:4Online publication date: 27-Mar-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
November 2014
1054 pages
ISBN:9781479955008
  • General Chair:
  • Trish Damkroger,
  • Program Chair:
  • Jack Dongarra

Sponsors

Publisher

IEEE Press

Publication History

Published: 16 November 2014

Check for updates

Qualifiers

  • Research-article

Conference

SC '14
Sponsor:

Acceptance Rates

SC '14 Paper Acceptance Rate 83 of 394 submissions, 21%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)MalleTrain: Deep Neural Networks Training on Unfillable Supercomputer NodesProceedings of the 15th ACM/SPEC International Conference on Performance Engineering10.1145/3629526.3645035(190-200)Online publication date: 7-May-2024
  • (2024)Partition deactivation with load balancing for parallel flow simulationsJournal of Computational Physics10.1016/j.jcp.2024.113387519:COnline publication date: 15-Dec-2024
  • (2024)On the Performance of Malleable APGAS Programs and Batch Job SchedulersSN Computer Science10.1007/s42979-024-02641-75:4Online publication date: 27-Mar-2024
  • (2024)Evolving APGAS Programs: Automatic and Transparent Resource Adjustments at RuntimeAsynchronous Many-Task Systems and Applications10.1007/978-3-031-61763-8_15(154-165)Online publication date: 14-Feb-2024
  • (2023)Scalable Multi-FPGA Design of a Discontinuous Galerkin Shallow-Water Model on Unstructured MeshesProceedings of the Platform for Advanced Scientific Computing Conference10.1145/3592979.3593407(1-12)Online publication date: 26-Jun-2023
  • (2023)Distributed Cell Set : A Library for Space-Dependent Communication/Computation Overlap on Manycore ClusterProceedings of the 14th International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/3582514.3582520(11-19)Online publication date: 25-Feb-2023
  • (2023)MPI-based Remote OpenMP Offloading: A More Efficient and Easy-to-use ImplementationProceedings of the 14th International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/3582514.3582519(50-59)Online publication date: 25-Feb-2023
  • (2022)Integrating a global load balancer to an APGAS distributed collections libraryProceedings of the Thirteenth International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/3528425.3529102(55-64)Online publication date: 2-Apr-2022
  • (2022)Accelerating communication for parallel programming models on GPU systemsParallel Computing10.1016/j.parco.2022.102969113:COnline publication date: 1-Oct-2022
  • (2021)Transparent Resource Elasticity for Task-Based Cluster Environments with Work Stealing50th International Conference on Parallel Processing Workshop10.1145/3458744.3473361(1-10)Online publication date: 9-Aug-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media