Achieving high performance on extremely large parallel machines

Achieving high performance on extremely large parallel machines: performance prediction and load balancing

January 2005

Author:
Gengbin Zheng
University of Illinois at Urbana-Champaign
,
Adviser:
Laxmikant V. Kale
University of Illinois at Urbana-Champaign

Publisher:

University of Illinois at Urbana-Champaign
Champaign, IL
United States

ISBN:978-0-542-50631-4

Order Number:AAI3202198

Pages:

163

Purchase on ProQuest

Bibliometrics

Abstract

Parallel machines with an extremely large number of processors are now in operation. For example, the IBM BlueGene/L machine with 128K processors is currently being deployed. It is going to be a significant challenge for application developers to write parallel programs in order to exploit the enormous compute power available and manually scale their applications on such machines. Solving these problems involves finding suitable parallel programming models for such machines and addressing issues like load imbalance. This thesis explores processor virtualization in Charm++ programming model and employing migratable objects for programming petaflops class machines supported by parallel emulation for algorithm validation, parallel simulation for performance prediction, and using new kinds of automatic load balancing strategies to substantially address many of these challenges for programming very large machines.

It is important to understand the performance of parallel applications on very large parallel machines. This thesis explores Parallel Discrete Event Simulation techniques to simulate parallel applications and predict their performance. We present a novel optimistic synchronization protocol which exploits the inherent determinacy in parallel applications to effectively reduce the synchronization overhead.

Load balance problem presents significant challenges to applications to achieve scalability on very large machines. We study load balancing techniques and develop a spectrum of load balancing strategies motivated by several real-world applications. We optimize our load balancing strategies in multiple dimensions of criteria such as communication-aware load balancing, sub-step load balancing, and computation phase-aware load balancing. We have successfully scaled NAMD (a classical molecular dynamics application) to 1TF of peak performance on 3000 processors of PSC Lemieux, using the load balancing framework presented in this thesis.

We further motivate the need for next generation load balancing strategies for petaflops class machines. We explore a novel design of a scalable hierarchical load balancing scheme, which incorporates an explicit memory cost control function to make it easy to adapt to extremely large machines with small memory footprint. This hierarchical load balancing scheme builds load data from instrumenting an application automatically at run-time on both computation and communication pattern. The load balancing strategy takes application communication pattern into account explicitly.

Cited By

Contributors

Laxmikant Vasudeo Kalé
University of Illinois Urbana-Champaign
- Publication Years1985 - 2024
- Publication counts222
- Citation count2,740
- Available for Download79
- Downloads (cumulative)24,645
- Downloads (12 months)1,821
- Downloads (6 weeks)314
- Average Downloads per Article312
- Average Citation per Article12
View Full Profile
Gengbin Zheng
University of Illinois Urbana-Champaign
- Publication Years2002 - 2012
- Publication counts29
- Citation count425
- Available for Download10
- Downloads (cumulative)2,699
- Downloads (12 months)46
- Downloads (6 weeks)3
- Average Downloads per Article270
- Average Citation per Article15
View Full Profile

Index Terms

Achieving high performance on extremely large parallel machines: performance prediction and load balancing

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Recommendations

High performance computing for vision on distributed-memory machines
Performance prediction of large parallel applications using parallel simulations

Accurate simulation of large parallel applications can be facilitated with the use of direct execution and parallel discrete event simulation. This paper describes the use of COMPASS, a direct execution-driven, parallel simulator for performance ...
Exploiting parallel processing in large scientific applications

Browse Theses

Sections

Cited By

Index Terms

High performance computing for vision on distributed-memory machines

Performance prediction of large parallel applications using parallel simulations

Exploiting parallel processing in large scientific applications

Sections

Cited By

Save to Binder

Index Terms

Recommendations

High performance computing for vision on distributed-memory machines

Performance prediction of large parallel applications using parallel simulations

Exploiting parallel processing in large scientific applications