TASCEL: Task Scheduling Library

TASCEL (pronounced "tassel") is a framework to study the design of
algorithms to address the challenges associated with programming
abstractions supporting finer-grained concurrency. It uses an active
message framework built on MPI. MPI allows quick prototyping and
evaluation on various platforms with minimal porting effort. The
active message framework enables the design of supporting algorithms
that are concurrent with ongoing execution (e.g., load balancing or
fault recovery concurrent with application execution). TASCEL supports
various threading modes, progress semantics, together with SPMD and
non-SPMD execution, so as to be representative of a variety of useful
execution environments.

Following is some recent work on algorithms for finer-grained
concurrency abstractions using the TASCEL library.

Scheduling and Load Balancing

Applications often involve iterative execution of identical or
slowly evolving calculations. Such applications require incremental
rebalancing to improve load balance across iterations. We evaluated
two distinct approaches to addressing this challenge:
persistence-based load balancing and work stealing. We developed a
hierarchical persistence-based rebalancing algorithm that performs
localized incremental rebalancing. We also developed a retentive work
stealing algorithm optimized for iterative applications on distributed
memory machines. Retentive work stealing exploits persistence to
incrementally rebalance work and reduce the overhead associated with
random stealing.

Fault Tolerance

We have designed fault tolerance mechanisms for task parallel
computations operating on global data. We developed three recovery
schemes that present distinct trade-offs: lazy recovery with
potentially increased re-execution cost, immediate collective recovery
with associated synchronization overheads, and noncollective recovery
enabled by additional communication. We employ distributed-memory work
stealing to dynamically rebalance the tasks onto the live
processes. We demonstrated that the overheads (space and time) of the
fault tolerance mechanism were low, the costs incurred due to failures
were small, and the overheads decreased with per-process work at
scale.

Tracing Work Stealing

Work stealing is inherently flexible and can tolerate variations
due to faults, power, of system noise anticipated on exascale
systems. However, such a flexible approach to dealing with unbalanced
distribution of work results in seemingly irregular computation
structures, complicating the study of the runtime behavior of work
stealing schedulers. Typical approaches to studying work stealing
often resorted to tracking information on each individual task. Given
that the number of tasks can be orders of magnitude greater than the
number of processor cores, this approach can quickly become
intractable. We have developed an approach to efficiently trace
async-finish parallel programs scheduled using work stealing with low
time and space overheads. We demonstrated the broader applicability of
this work, in addition to replay-based performance analysis, through
two use cases: the optimization of correctness tools that detect data
races in async-finish programs; the design of retentive work stealing
algorithms for recursive parallel programs.