[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2642769.2642786acmotherconferencesArticle/Chapter ViewAbstractPublication Pageseurompi-asiaConference Proceedingsconference-collections
research-article

Exploring the effect of noise on the performance benefit of nonblocking allreduce

Published: 09 September 2014 Publication History

Abstract

Relaxed synchronization offers the potential of maintaining application scalability by allowing many processes to make independent progress when some processes suffer delays. Yet, the benefits of this approach in important parallel workloads have not been investigated in detail. In this paper, we use a validated simulation approach to explore the noise mitigation effects of nonblocking allreduce in workloads where allreduce is a major contributor to total execution time. Although a nonblocking allreduce is unlikely to provide significant benefit to applications in the low-OS-noise environments expected in next-generation HPC systems, we show that it can potentially improve application runtime with respect to other noise types.

References

[1]
P. Beckman, K. Iskra, K. Yoshii, S. Coghlan, and A. Nataraj. Benchmarking the effects of operating system interference on extreme-scale parallel machines. Cluster Computing, 11(1):3--16, 2008.
[2]
R. Brightwell, R. Riesen, and K. D. Underwood. Analyzing the impact of overlap, offload, and independent progress for message passing interface applications. Intl. Journal of High Performance Computing Applications, 19(2):103--117, 2005.
[3]
G. Bronevetsky. Communication-sensitive static dataflow for parallel message passing applications. In Proceedings of the 7th annual IEEE/ACM Intl. Symposium on Code Generation and Optimization, pages 1--12. IEEE Computer Society, 2009.
[4]
D. Culler, R. Karp, D. Patterson, A. Sahay, K. E. Schauser, E. Santos, R. Subramonian, and T. von Eicken. Logp: towards a realistic model of parallel computation. SIGPLAN Not., 28(7):1--12, July 1993.
[5]
J. E. S. Hertel, R. L. Bell, M. G. Elrick, A. V. Farnsworth, G. I. Kerley, J. M. McGlaun, S. V. PetneY, S. A. Silling, P. A. Taylor, and L. Yarrington. CTH: A software family for multi-dimensional shock physics analysis. In Proceedings of the 19th Intl. Symp. on Shock Waves, pages 377--382, July 1993.
[6]
Exascale Co-Design Center for Materials in Extreme Environments (ExMatEx). http://exmatex.lanl.gov/. Retrieved 16 Jan 2014.
[7]
K. B. Ferreira, P. Bridges, and R. Brightwell. Characterizing application sensitivity to os interference using kernel-level noise injection. In Proceedings of the 2008 ACM/IEEE conference on Supercomputing, page 19. IEEE Press, 2008.
[8]
K. B. Ferreira, S. L. Levy, P. M. Widener, D. Arnold, and T. Hoefler. Understanding the effects of communication and coordination on checkpointing at scale. In Proc. Inernational Conference for High Performance Computing, Networking, Storage and Analysis (Supercomputing), New Orleans, Louisiana, November 2014. IEEE/ACM. To appear.
[9]
P. Ghysels and W. Vanroose. Hiding global synchronization latency in the preconditioned conjugate gradient algorithm. Parallel Computing, 2013.
[10]
V. E. Henson and U. M. Yang. Boomeramg: A parallel algebraic multigrid solver and preconditioner. Appl. Num. Math., 41:155--177, 2002.
[11]
M. A. Heroux, D. W. Doerfler, P. S. Crozier, J. M. Willenbring, H. C. Edwards, A. Williams, M. Rajan, E. R. Keiter, H. K. Thornquist, and R. W. Numrich. Improving performance via mini-applications. Technical Report SAND2009-5574, Sandia National Laboratory, 2009.
[12]
T. Hoefler, P. Kambadur, R. L. Graham, G. Shipman, and A. Lumsdaine. A case for standard non-blocking collective operations. In Recent Advances in Parallel Virtual Machine and Message Passing Interface, pages 125--134. Springer, 2007.
[13]
T. Hoefler, A. Lumsdaine, and W. Rehm. Implementation and Performance Analysis of Non-Blocking Collective Operations for MPI. In Proc. of the 2007 Intl. Conference on High Performance Computing, Networking, Storage and Analysis, SC07. IEEE Computer Society/ACM, Nov. 2007.
[14]
T. Hoefler, T. Schneider, and A. Lumsdaine. Characterizing the Influence of System Noise on Large-Scale Applications by Simulation. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC'10), Nov. 2010.
[15]
T. Hoefler, T. Schneider, and A. Lumsdaine. LogGOPSim - Simulating Large-Scale Applications in the LogGOPS Model. In Proc. of the 19th ACM International Symp. on High Performance Distributed Computing, pages 597--604. ACM, Jun. 2010.
[16]
T. Hoefler, C. Siebert, and A. Lumsdaine. Scalable Communication Protocols for Dynamic Sparse Data Exchange. In Proc. of the 2010 ACM Symposium on Principles and Practice of Parallel Programming (PPoPP'10), pages 159--168. ACM, Jan. 2010.
[17]
T. Hoefler, J. Squyres, G. Bosilca, G. Fagg, A. Lumsdaine, and W. Rehm. Non-Blocking Collective Operations for MPI-2. Technical report, Open Systems Lab, Indiana University, Aug. 2006.
[18]
I. Karlin, A. Bhatele, B. L. Chamberlain, J. Cohen, Z. Devito, M. Gokhale, R. Haque, R. Hornung, J. Keasler, D. Laney, E. Luke, S. Lloyd, J. McGraw, R. Neely, D. Richards, M. Schulz, C. H. Still, F. Wang, and D. Wong. Lulesh programming model and performance ports overview. Technical Report LLNL-TR-608824, December 2012.
[19]
L. Lamport. Time, clocks, and the ordering of events in a distributed system. Communications of the ACM, 21(7):558--565, July 1978.
[20]
S. Levy, B. Topp, K. B. Ferreira, D. Arnold, T. Hoefler, and P. Widener. Using simulation to evaluate the performance of resilience strategies at scale. In High Performance Computing, Networking, Storage and Analysis (SCC), 2013 SC Companion:. IEEE, 2013.
[21]
Message Passing Interface Forum. MPI: A message-passing interface standard, version 3.0. http://www.mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf, Sept. 2012.
[22]
F. Petrini, D. Kerbyson, and S. Pakin. The case of the missing supercomputer performance: Achieving optimal performance on the 8,192 processors of asci q. In Supercomputing, 2003 ACM/IEEE Conference, pages 55--55. IEEE, 2003.
[23]
S. J. Plimpton. Fast parallel algorithms for short-range molecular dynamics. Journal Computation Physics, 117:1--19, 1995.
[24]
J. Vetter and C. Chambreau. mpip: Lightweight, scalable mpi profiling. URL: http://www.llnl.gov/CASC/mpiP, 2005.
[25]
K. Yoshii, K. Iskra, H. Naik, P. Beckman, and P. C. Broekema. Characterizing the performance of "big memory" on blue gene linux. In Parallel Processing Workshops, 2009. ICPPW'09. International Conference on, pages 65--72. IEEE, 2009.

Cited By

View all
  • (2023)Using MPIs Non-Blocking Allreduce for Health Checks in Dynamic SimulationsParallel and Distributed Computing, Applications and Technologies10.1007/978-981-99-8211-0_3(25-31)Online publication date: 29-Nov-2023
  • (2021)Workload Imbalance in HPC Applications: Effect on Performance of In-Network Processing2021 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC49654.2021.9622847(1-8)Online publication date: 20-Sep-2021
  • (2017)Understanding Performance Variability on the Aries Dragonfly Network2017 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER.2017.76(809-813)Online publication date: Sep-2017
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
EuroMPI/ASIA '14: Proceedings of the 21st European MPI Users' Group Meeting
September 2014
183 pages
ISBN:9781450328753
DOI:10.1145/2642769
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • Kyoto University: Kyoto University
  • University of Tokyo
  • University of Tsukuba: University of Tsukuba

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 September 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Collective operations
  2. Nonblocking collectives
  3. OS noise

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

EuroMPI/ASIA '14

Acceptance Rates

EuroMPI/ASIA '14 Paper Acceptance Rate 18 of 39 submissions, 46%;
Overall Acceptance Rate 18 of 39 submissions, 46%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)1
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Using MPIs Non-Blocking Allreduce for Health Checks in Dynamic SimulationsParallel and Distributed Computing, Applications and Technologies10.1007/978-981-99-8211-0_3(25-31)Online publication date: 29-Nov-2023
  • (2021)Workload Imbalance in HPC Applications: Effect on Performance of In-Network Processing2021 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC49654.2021.9622847(1-8)Online publication date: 20-Sep-2021
  • (2017)Understanding Performance Variability on the Aries Dragonfly Network2017 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER.2017.76(809-813)Online publication date: Sep-2017
  • (2016)Understanding performance interference in next-generation HPC systemsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3014904.3014949(1-12)Online publication date: 13-Nov-2016
  • (2016)How I Learned to Stop Worrying and Love In Situ AnalyticsProceedings of the 23rd European MPI Users' Group Meeting10.1145/2966884.2966920(140-153)Online publication date: 25-Sep-2016
  • (2016)Understanding Performance Interference in Next-Generation HPC SystemsSC16: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC.2016.32(384-395)Online publication date: Nov-2016
  • (2016)Scheduling in-situ analytics in next-generation applicationsProceedings of the 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing10.1109/CCGrid.2016.42(102-105)Online publication date: 16-May-2016
  • (2015)Towards Understanding Post-recovery Efficiency for Shrinking and Non-shrinking RecoveryEuro-Par 2015: Parallel Processing Workshops10.1007/978-3-319-27308-2_53(656-668)Online publication date: 18-Dec-2015

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media