[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1787275.1787326acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
research-article

Enabling a highly-scalable global address space model for petascale computing

Published: 17 May 2010 Publication History

Abstract

Over the past decade, the trajectory to the petascale has been built on increased complexity and scale of the underlying parallel architectures. Meanwhile, software developers have struggled to provide tools that maintain the productivity of computational science teams using these new systems. In this regard, Global Address Space (GAS) programming models provide a straightforward and easy to use addressing model, which can lead to improved productivity. However, the scalability of GAS depends directly on the design and implementation of the runtime system on the target petascale distributed-memory architecture. In this paper, we describe the design, implementation, and optimization of the Aggregate Remote Memory Copy Interface (ARMCI) runtime library on the Cray XT5 2.3 PetaFLOPs computer at Oak Ridge National Laboratory. We optimized our implementation with the flow intimation technique that we have introduced in this paper. Our optimized ARMCI implementation improves scalability of both the Global Arrays (GA) programming model and a real-world chemistry application - NWChem - from small jobs up through 180,000 cores.

References

[1]
Global arrays toolkit. http://www.emsl.pnl.gov/docs/global.
[2]
Top500 list. http://www.top500.org.
[3]
Upc specifications, v1.2. http://www.gwu.edu/upc/publications/LBNL-59208.pdf.
[4]
Chapel language specifications, v0.780, 2006. http://chapel.cs.washington.edu/spec-0.780.pdf.
[5]
Report on experimental language X10, 2008. http://dist.codehaus.org/x10/documentation/languagespec/x10-170.pdf.
[6]
E. Apra, R. J. Harrison, W. de Jong, A. Rendell, V. Tipparaju, S. Xantheas, and R. Olsen. Liquid water: Obtaining the right answer for the right reasons. In Supercomputing, 2009. SC '09. Proceedings of the ACM/IEEE SC 2009 Conference, 2009.
[7]
B. W. Barrett, G. M. Shipman, and A. Lumsdaine. Analysis of implementation options for mpi-2 one-sided. In Proceedings, Euro PVM/MPI, Paris, France, October 2007.
[8]
D. Bonachea. Gasnet specification, v1.1. Technical report, Berkeley, CA, USA, 2002.
[9]
R. Brightwell, R. Riesen, B. Lawry, and A. Maccabe. Portals 3.0: protocol building blocks for low overhead communication. Parallel and Distributed Processing Symposium., Proceedings International, IPDPS 2002, Abstracts and CD-ROM, pages 164--173, 2002.
[10]
E. Bylaska and et al. NWChem, A Computational Chemistry Package for Parallel Computers, Version 5.1, 2007.
[11]
W. A. de Jong and S. Krishnamoorthy. private communication, 2008.
[12]
Y. Dotsenko, C. Coarfa, and J. Mellor-Crummey. A multi-platform co-array fortran compiler. In Parallel Architecture and Compilation Techniques, 2004. PACT 2004. Proceedings. 13th International Conference on, pages 29--40, Sept.-3 Oct. 2004.
[13]
T. H. Dunning. Gaussian basis sets for use in correlated molecular calculations. i. the atoms boron through neon and hydrogen. The Journal of Chemical Physics, 90(2):1007--1023, 1989.
[14]
T. H. Dunning, K. A. Peterson, D. E. Woon, and A. K. Wilson. Quantifying quantum chemistry. In American Conference on Theoretical Chemistry, 1999. unpublished.
[15]
R. Kobayashi and A. P. Rendell. A direct coupled cluster algorithm for massively parallel computers. Chemical Physics Letters, 265(1-2):1--11, 1997.
[16]
J. Nieplocha, B. Palmer, V. Tipparaju, M. Krishnan, H. Trease, and E. Apra. Advances, Applications and Performance of the Global Arrays Shared Memory Programming Toolkit. International Journal of High Performance Computing Applications, 20(2):203--231, 2006.
[17]
J. Nieplocha, V. Tipparaju, and E. Apra. An evaluation of two implementation strategies for optimizing one-sided atomic reduction. In IPDPS '05: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 9, page 215.2, Washington, DC, USA, 2005. IEEE Computer Society.
[18]
J. Nieplocha, V. Tipparaju, and M. Krishnan. Optimizing strided remote memory access operations on the quadrics qsnetii network interconnect. In HPCASIA '05: Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region, page 28, Washington, DC, USA, 2005. IEEE Computer Society.
[19]
J. Nieplocha, V. Tipparaju, M. Krishnan, and D. K. Panda. High Performance Remote Memory Access Communication: The Armci Approach. International Journal of High Performance Computing Applications, 20(2):233--253, 2006.
[20]
J. Nieplocha, V. Tipparaju, A. Saify, and D. Panda. Protocols and strategies for optimizing performance of remote memory operations on clusters. In Parallel and Distributed Processing Symposium., Proceedings International, IPDPS 2002, Abstracts and CD-ROM, pages 164--173, 2002.
[21]
J. Nieplocha, V. Tipparaju, A. Saify, and D. Panda. Protocols and strategies for optimizing performance of remote memory operations on clusters. In In: Proc. Workshop Communication Architecture for Clusters (CAC02) of IPDPS '02, Ft, 2002.
[22]
K. Parzyszek. Generalized portable shmem library for high performance computing. PhD thesis, Ames, IA, USA, 2003. Co-Major Professor-Kendall, Ricky A. and Co-Major Professor-Lutz, Robyn R.
[23]
L. Pollack, T. L. Windus, W. A. de Jong, and D. A. Dixon. Thermodynamic properties of the c5, c6, and c8 n-alkanes from ab initio electronic structure theory. The Journal of Physical Chemistry A, 109(31):6934--6938, 2005.
[24]
A. Shet, V. Tipparaju, and R. Harrison. Asynchronous programming in upc: A case study and potential for improvement. In Workshop on Asynchrony in the PGAS Programming Model Collocated with ICS 2009, Sept. 2009.
[25]
V. Tipparaju, A. Kot, J. Nieplocha, M. Bruggencate, and N. Chrisochoides. Evaluation of remote memory access communication on the cray xt3. In Parallel and Distributed Processing Symposium, 2007. IPDPS 2007. IEEE International, pages 1--7, March 2007.
[26]
V. Tipparaju, G. Santhanaraman, J. Nieplocha, and D. K. Panda. Host-assisted zero-copy remote memory access communication on infiniband. In Parallel and Distributed Processing Symposium, 2004. Proceedings. 18th International, pages 31--, April 2004.

Cited By

View all
  • (2014)Fault-Tolerant Routing Algorithm Simulation and Hardware Verification of NoCIEEE Transactions on Applied Superconductivity10.1109/TASC.2014.234648424:5(1-5)Online publication date: Oct-2014
  • (2013)Scalable PGAS metadata management on extreme scale systemsProceedings of the 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing10.1109/CCGrid.2013.83(103-111)Online publication date: 13-May-2013
  • (2013)A Distributed Run-Time Environment for the Kalray MPPA®-256 Integrated Manycore ProcessorProcedia Computer Science10.1016/j.procs.2013.05.33318(1654-1663)Online publication date: 2013
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CF '10: Proceedings of the 7th ACM international conference on Computing frontiers
May 2010
370 pages
ISBN:9781450300445
DOI:10.1145/1787275
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 May 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. armci
  2. flow control
  3. ga
  4. gas
  5. global address space
  6. global arrays
  7. nwchem
  8. pgas
  9. xt5

Qualifiers

  • Research-article

Conference

CF'10
Sponsor:
CF'10: Computing Frontiers Conference
May 17 - 19, 2010
Bertinoro, Italy

Acceptance Rates

CF '10 Paper Acceptance Rate 30 of 113 submissions, 27%;
Overall Acceptance Rate 273 of 785 submissions, 35%

Upcoming Conference

CF '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2014)Fault-Tolerant Routing Algorithm Simulation and Hardware Verification of NoCIEEE Transactions on Applied Superconductivity10.1109/TASC.2014.234648424:5(1-5)Online publication date: Oct-2014
  • (2013)Scalable PGAS metadata management on extreme scale systemsProceedings of the 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing10.1109/CCGrid.2013.83(103-111)Online publication date: 13-May-2013
  • (2013)A Distributed Run-Time Environment for the Kalray MPPA®-256 Integrated Manycore ProcessorProcedia Computer Science10.1016/j.procs.2013.05.33318(1654-1663)Online publication date: 2013
  • (2012)GA-GPUProceedings of the 9th conference on Computing Frontiers10.1145/2212908.2212918(53-64)Online publication date: 15-May-2012
  • (2012)Supporting the Global Arrays PGAS Model Using MPI One-Sided CommunicationProceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium10.1109/IPDPS.2012.72(739-750)Online publication date: 21-May-2012
  • (2012)HiCOOJournal of Parallel and Distributed Computing10.1016/j.jpdc.2012.01.02272:11(1481-1492)Online publication date: 1-Nov-2012
  • (2012)Performance characterization of global address space applications: a case study with NWChemConcurrency and Computation: Practice & Experience10.1002/cpe.188124:2(135-154)Online publication date: 1-Feb-2012
  • (2011)Virtual Topologies for Scalable Resource Management and Contention Attenuation in a Global Address Space Model on the Cray XT5Proceedings of the 2011 International Conference on Parallel Processing10.1109/ICPP.2011.38(235-244)Online publication date: 13-Sep-2011
  • (2011)Network-Friendly One-Sided Communication through Multinode Cooperation on Petascale Cray XT5 SystemsProceedings of the 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing10.1109/CCGrid.2011.62(352-361)Online publication date: 23-May-2011
  • (2010)Cooperative server clustering for a scalable GAS model on petascale cray XT5 systemsComputer Science - Research and Development10.1007/s00450-010-0104-625:1-2(57-64)Online publication date: 8-Apr-2010

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media