Minimizing latency of agreement protocols

This technical report is based on a dissertation submitted September
2005 by the author for the degree of Doctor of Philosophy to the
University of Cambridge, Trinity Hall.

Abstract

Maintaining consistency of fault-tolerant distributed systems is
notoriously difficult to achieve. It often requires non-trivial
agreement abstractions, such as Consensus, Atomic Broadcast, or Atomic
Commitment. This thesis investigates implementations of such
abstractions in the asynchronous model, extended with unreliable failure
detectors or eventual synchrony. The main objective is to develop
protocols that minimize the number of communication steps required in
failure-free scenarios but remain correct if failures occur. For several
agreement problems and their numerous variants, this thesis presents
such low-latency algorithms and lower-bound theorems proving their
optimality.

The observation that many agreement protocols share the same round-based
structure helps to cope with a large number of agreement problems in a
uniform way. One of the main contributions of this thesis is
“Optimistically Terminating Consensus” (OTC) – a new lightweight
agreement abstraction that formalizes the notion of a round. It is used
to provide simple modular solutions to a large variety of agreement
problems, including Consensus, Atomic Commitment, and Interactive
Consistency. The OTC abstraction tolerates malicious participants and
has no latency overhead; agreement protocols constructed in the OTC
framework require no more communication steps than their ad-hoc
counterparts.

The attractiveness of this approach lies in the fact that the
correctness of OTC algorithms can be tested automatically. A theory
developed in this thesis allows us to quickly evaluate OTC algorithm
candidates without the time-consuming examination of their entire state
space. This technique is then used to scan the space of possible
solutions in order to automatically discover new low-latency OTC
algorithms. From these, one can now easily obtain new implementations of
Consensus and similar agreement problems such as Atomic Commitment or
Interactive Consistency.

Because of its continuous nature, Atomic Broadcast is considered
separately from other agreement abstractions. I first show that no
algorithm can guarantee a latency of less than three communication steps
in all failure-free scenarios. Then, I present new Atomic Broadcast
algorithms that achieve the two-step latency in some special cases,
while still guaranteeing three steps for other failure-free scenarios.
The special cases considered here are: Optimistic Atomic Broadcast,
(Optimistic) Generic Broadcast, and closed-group Atomic Broadcast. For
each of these, I present an appropriate algorithm and prove its latency
to be optimal.