Understanding
Network Connections

Butler Lampson

There
are lots of protocols for establishing connections (or equivalently, doing
at-most-once message delivery) across a network that can delay, reorder,
duplicate and lose packets.Most of the
popular ones are based on three-way handshake, but some use clocks or extra
stable storage operations to reduce the number of messages required.It’s hard to understand why the protocols
work, and there are almost no correctness proofs; even careful specifications are
rare.

I
will give a specification for at-most-once message delivery, an informal
account of the main problems an implementation must solve and the common
features that most implementations share, and outlines of proofs for three
implementations.The specifications and
proofs based on Lamport’s methods for using abstraction functions to understand
concurrent systems, and I will say something about how his methods can be
applied to many other problems of practical interest.

Understanding Network Connections

Butler Lampson

October 30, 1995

This is joint work with Nancy Lynch.

The errors in this talk are mine, however.

Overview

Specify at-most-once message delivery.

Describe other features we want from an implementation

Give a framework for thinking about implementations.

Show how to prove correctness of an implementation.

The Problem

Network Connections
or
Reliable At-Most-Once Messages

Messages are delivered in fifo
order.

A message is not delivered more than once.

A message is acked only if delivered.

A message or ack is lost only if it is being sent between
crash and recovery.

Pragmatics

“Everything
should be made as simple as possible, but no simpler.”A. Einstein

Make progress: regardless of crashes, if both ends stay up a waiting
message is sent, andotherwise both parties become idle.

Idle at no cost: an idle agenthas no state that changes for each
message, anddoesn’t send any packets.

Minimize stable storage operations — <<1 per message.

Use channels that are easy to implement:They may lose, duplicate, or reorder
messages.

Pragmatics

Some pragmatic issues we won’t discuss:Retransmission policy.Detecting failure of an attempt to
send or ack, by timing it out.

Describing a System

A system is defined by a safety and a liveness property:

Safety:
nothing bad ever happens. Defined by a state machine:

A set of states. A state is a pair (external
state, internal state)
A set of initial states.
A set of transitions from one state
to another.

Liveness: something good
eventually happens.

An action is a named set of transitions; actions partition the
transitions.

For instance: put(m); get(m);
crashs

A history is a possible sequence of actions, starting from an initial
state.

The behavior of the system is the set of possible histories.

An external action
is one in which the external state changes. Correspondingly there are external
histories and behaviors.

Defining Actions

An action is:

A name, possibly with
parameters: put(“red”).

A guard, a predicate on
the state which must be true for this action to be a possible transition: q ≠ < > and i > 3.

An effect, changes in
some of the state variables: i := i + 1.

The entire action is atomic.

Example:

get(m):m first on qtake first from q, if q now empty and
status = ? then status := true

Abstraction function

Methodology for Proofs

Simplify
the spec and the implementations.Save clever encodings for later.

Make a “working spec” that’s easier to handle:It implements the actual spec.It has as much non-determinism as
possible.All the prophecy is between it and
the actual spec.

actual¬implementsworking¬implementsimplemen-

spec spectation

Find the abstraction function. The rest is automatic.

Give names to important functions of your state variables.

To design an implementation, first invent the guards you
need,
then figure out how to implement them.

History Variables

If you add a variable h
to the state space such that

If s is an old initial
state then there’s an h such that (s, h) is initial;

If (s, h) ® (s',
h') then s®s';

If s®s'
then for any h there’s an h' such that (s, h) ® (s', h')

then the new state machine has the same histories as the old
one.

Predicting Non-Determinism

Suppose we add mode
:= acked to crashs.

Consider the sequence put(“red”),
snd, crashs, put(“blue”), snd.

Now we have sr =
{(1, “red”), (2, “blue”)}. We need an ordering on identifiers to order these
packets and maintain fifo
delivery. On rcvsr(i, m) the receiver must remove all
identifiers ≤ i from goodr.

But now “red” is lost if (2, “blue”) is received first. If
we use the obvious abstraction function

q = the m’s from {(i, m) ÎsrÈ
(lasts,
cur)| iÎgoodr} sorted by i,

this loss doesn’t happen between crashs and recovers,
as allowed by the spec, but later at the rcv.