This thesis develops a new reduction-based analysis methodology for studying the worst-case end-to-end delay and schedulability of real-time jobs in distributed systems. The main result is a simple delay composition rule, that computes a worst-case bound on the end-to-end delay of a job, given the computation times of all other jobs that execute concurrently with it in the system. This delay composition rule is first derived for pipelined distributed systems, where all the jobs execute on the same sequence of resources before leaving the system. We then derive the delay composition rule for systems where the union of task paths forms a Directed Acyclic Graph (DAG), and subsequently generalize the result to non-acyclic task graphs as well, under both preemptive and non-preemptive scheduling. The result makes no assumptions on periodicity and is valid for periodic and aperiodic jobs. It applies to fixed and dynamic priority scheduling, as long as all jobs have the same relative priority on all stages on which they execute. The delay composition result enables a simple reduction of the distributed system to an equivalent hypothetical uniprocessor that can be analyzed using traditional uniprocessor schedulability analysis to infer the schedulability of the distributed system. Thus, the wealth of uniprocessor analysis techniques can now be used to analyze distributed task systems. Such a reduction significantly reduces the complexity of analysis and ensures that the analysis does not become exceedingly pessimistic with system scale, unlike existing analysis techniques for distributed systems such as holistic analysis and network calculus. Evaluation using simulations suggest that
the new reduction-based analysis is able to significantly outperform existing analysis
techniques, and the improvement is more pronounced for larger systems.
We develop an algebra, called delay composition algebra, based on the delay composition results for systematic transformation of distributed real-time task systems into single-resource task systems such that
schedulability properties of the original system are preserved.
The operands of the algebra represent workloads on composed subsystems, and the operators define ways in which
subsystems can be composed together. By repeatedly applying
the operators on the operands representing resource stages, any distributed system can be
systematically reduced to an equivalent uniprocessor that can be analyzed later to determine end-to-end
delay and schedulability properties of all jobs in the original distributed system.
The above reduction-based schedulability analysis techniques suffer from pessimism that results
from mismatches between uniprocessor analysis assumptions and characteristics of workloads
reduced from distributed systems, especially for the case of periodic tasks. To address the problem, we introduce
{\em flow-based mode changes\/}, a uniprocessor load model
tuned to the novel constraints of workloads reduced from distributed
system tasks. In this model, transition of a job from
one resource to another in the distributed system, is modeled as mode changes on the
uniprocessor. We present a new iterative solution to compute the worst-case end-to-end delay
of a job in the new uniprocessor task model. Our simulation
studies suggest that the resulting schedulability analysis is able to
admit over 25\% more utilization than other existing techniques, while
still guaranteeing that all end-to-end deadlines of tasks are met.
As systems are becoming increasingly distributed, it becomes important to understand their {\em structural robustness\/} with
respect to timing uncertainty. Structural robustness, a concept that arises by virtue of multi-stage execution, refers to the robustness
of end-to-end timing behavior of an execution graph towards unexpected timing violations in individual execution stages. A robust topology
is one where such violations minimally affect end-to-end execution delay. We show that the manner in which resources are allocated
to execution stages can affect the robustness. Algorithms are presented for resource allocation that improves the
robustness of execution graphs. Evaluation shows that such algorithms are able to reduce deadline misses due to unpredictable timing
violations by 40-60\%. Hence, the approach is important for soft real-time systems, systems where timing uncertainty exists,
or where worst-case timing is not entirely verified.
We finally show two contexts in which the above theory can be applied to the domain of wireless networks. First,
we developed a bandwidth allocation scheme for elastic real-time flows in multi-hop wireless networks. The
problem is cast as one of utility maximization, where each flow has a utility that is a concave
function of its flow rate, subject to delay constraints. The delay constraints are obtained from our
end-to-end delay bounds and adapted to only use localized information available within the neighborhood of each
node. A constrained network utility maximization problem is formulated and solved, the solution to which results
in a distributed algorithm that each node can independently execute to maximize global
utility. Second, we study the problem of minimizing the worst-case end-to-end delay of packets of flows
in a wireless network under arbitrary schedulability constraints. Using a coordinated earliest-deadline-first
strategy, we show that a worst-case end-to-end delay bound that has the same form as our delay composition results
for distributed systems can be obtained.
We discuss several avenues for future work that build on top of the theory developed in this thesis. We hope that
this thesis will provide the foundation to develop a more comprehensive and widely applicable theory for the study of
delay, schedulability, and other end-to-end properties in distributed systems.