Versatile Communication Cost Modelling for Multicomputer Task Scheduling
Abstract
Programmers face daunting problems when attempting to design portable
programs for multicomputers. This is mainly due to the huge variation
in communication performance on the range of multicomputer platforms
currently in use. These programmers require a computational model that
is sufficiently abstract to allow them to ignore machine-specific
performance features, and yet is sufficiently versatile to allow the
computational structure to be mapped efficiently to a wide range of
multicomputer platforms.
This dissertation focusses on parallel computations that can be
expressed as task graphs: tasks that must be scheduled on the
multicomputer's processors. In the past, scheduling models have only
considered the message delay as the predominant communication
parameter. In the current generation of parallel machines, however,
latency is negligible compared to the CPU penalty of
communication-related activity associated with inter-processor
communication. This CPU penalty cannot be modelled by a latency
parameter because the CPU activity consumes time otherwise available
for useful computation. In view of this, we consider a model in which
the CPU penalty is significant and is associated with communication
events that are incurred when applications execute in parallel.
In this dissertation a new multi-stage scheduling approach that takes
into account these communication parameters is proposed. Initially, in
the first stage, the input task graph is transformed into a new
structure that can be scheduled with a smaller number of communication
events. Task replication is incorporated to produce clusters of tasks.
However, a different view of clusters is adopted. Tasks are clustered
so that messages are bundled and consequently, the number of
communication events is decreased. The communication event tasks are
associated with the relationship between the clusters. More
specifically, this stage comprises a family of scheduling heuristics
that can be customised to classes of parallel machines, according to
their communication performance characteristics, through
parameterisation and by varying the order in which the heuristics are
applied. A second stage is necessary, where the actual schedule on
the target machine is defined. The mechanisms implemented analyse
carefully the clusters and their relationship so that communication
costs are minimised and the degree of parallelism is exploited.
Therefore, the aim of the proposed approach is to tackle the min-max
problem, considering realistic architectural issues.