Position Paper for the ACM SigOps Workshop 1988
-----------------------------------------------
Andrew Birrell
Digital Equipment Corporation
Systems Research Center
130 Lytton Avenue
Palo Alto, CA 94301, U.S.A.
1. The Desire for Autonomy
---------------------------
Why build a distributed system? Why not spend the same money and
effort on a centralized system? There are several attractions to
sticking with a centralized system.
a) It's easier to build. It doesn't need much research to build
powerful centralized systems. The technologies are well understood,
even if the requirements include high availability in the presence
of failures.
b) It performs better. A distributed system necessarily involves a
static partition of the total resources into smaller boxes. A
centralized system can partition dynamically, responding to the
time-varying offered load.
c) It's easier to manage. You employ one system manager for the
centralized system, and he does the management as his job, what he
gets paid for. Distributed systems typically make each user be a
system manager, even although it's not viewed as a productive part
of their job. This isn't a necessary consequence of distribution,
but it seems to happen with today's systems.
So the advocates of distributed systems must be able to justify why
their customers should forego these benefits. The possibility that
distributed systems research might be fun is not enough. Advocates
of distribution often give variants of the following justifications.
d) The same-cost distributed system will be more powerful. This
argument says that distributed systems are presently more cost-
effective.
e) The distributed system will allow for high-bandwidth user
interaction. This argument says that economic high bandwidth
links must be shorter than the distance to the machine room.
f) The entire system must be widely dispersed geographically. This
is a special case of (e); long-distance high-bandwidth links are
indeed prohibitively expensive.
g) The users will be more productive if they are insulated from
failures of centrally managed resources. This is part of the
"autonomy" argument.
h) Users of a distributed system can choose to be their own system
managers, if they wish to configure their part of the system in
non-standard ways. This is the positive side of (c) above. This
is also part of the autonomy argument.
I am unaware of published numbers that support the cost-effectiveness
argument (d). Indeed, the empirical evidence provided by IBM's share
price indicates that (d) is false. Despite the evidence in support
of (d) provided by Sun Microsystems' share price.
There is a little truth to (e). Experiments with running the X window
protocol across 9600 baud lines (with data compression) are disappointing;
that bandwidth appears to be inadequate to keep the screens up to
date in the style to which we have become accustomed. But only just
inadequate. A 19.2K line would probably be fine for 800 x 1000 black
and white displays, and a 56K line would be easy. A 0.5M line would
certainly be good enough for high quality color. These speeds are
easy to achieve within a local area. This indicates that user interaction
would be well satisfied by a centralized system with X window terminals
connected to it by a medium speed star-shaped local area network.
Argument (f) is real. But most of the distributed systems we have
seen don't work satisfactorily across wide areas either.
This leaves us with the two parts of the autonomy argument as the
dominant rationale for building distributed systems. Even if the
other arguments were partially true, autonomy would still stand out
as an important reason for distribution, probably the major one apart
from wide-area geographic distribution.
2. Loss of Autonomy
--------------------
Some of the hardest design issues that we face revolve around the
conflict between this desire for autonomy, and the inevitable loss
of autonomy that comes from access to shared services.
The simplest distributed systems are formed from network interconnection
of totally autonomous machines. These systems offer network services
that are used by only in response to explicit user requests (such
as file transfers in the FTP style, or explicit remote login). In
such systems there is little loss of autonomy: each machine, be it
a workstation, a VAX 8800 or a large mainframe, will as usable in
the presence of network or server failures as it would have been without
the addition of network services. The effect of failure of a network
service will be at most to cause the explicit user request to fail.
The user is denied only access to the failed resource, and it is
necessarily a resource that the user is explicitly aware of.
As distributed systems have become more sophisticated, the machines
have come to depend on their services more intimately. At the extreme,
the entire distributed system behaves as a single distributed time-
sharing system, vulnerable to the failure of its components. When
sufficient components fail, the entire system becomes monolithically
unavailable. This effect was once described (by Leslie Lamport) as
"a distributed system is one where my program can crash because of
some computer that I've never heard of".
There is room for argument about whether failure of a network component
should be able to cause failure of an individual workstation. However,
it seems clear (to me) that the failure of a network component shouldn't
be able to cause failure of a large system such as a mainframe or
a building full of workstations. In other words: failure of a small
part must not be able to cause failure of a much larger part. In
other words: for large enough components of the system, autonomy is
a requirement, not just a desire.
One approach to avoiding this loss of autonomy is to work hard at
reducing the probability of network-based failures to an "acceptably"
low value. We believe this does not solve the problem. The available
techniques (including replication) can reduce some failure modes,
but they have no impact on several key failures, notably: administrative
errors, replicated bugs and environmental problems (air conditioning
design, or the city's electricity supply).
Another approach is to replicate each necessary service everywhere
where you care about retaining autonomy. For example you could run
a name server replica and a satellite time receiver on each mainframe.
This would be acceptable from the point of view of the extra resource
it would consume on the mainframe (or workstation cluster). But it
is likely to be unacceptable to the name service because of the vast
number of replicas it would create. And even this technique doesn't
handle the problems of administrative failure and replicated bugs.
3. Regaining Autonomy
----------------------
One technique would, if feasible, solve this problem. We could arrange
that for each critical service, if the network service is unavailable
but is required for continued local operation, then an acceptable
substitute is available locally. A trivial example of this is that
if the network time service is unavailable, we will use the local
clock instead.
We (Mike Burrows, Roger Needham, Michael Schroeder and the present
author) have been investigating the practicality of alleviating the
autonomy problem by a general facility known (hereafter) as "stashing".
A stash is akin to a cache, in that it contains results of previous
accesses to network services. But the distinguishing feature is that
the values in the stash will be used in place of the network service
if the service is unavailable. In other words values in a stash are
most useful at the times when they cannot be verified; values in a
cache just optimize the verification process.
A cache contains a reasonably arbitrary selection of things recently
referenced. A stash is a more principled structure, containing things
that are believed to be important to the continued well-being of the
local machine. For example, even although you haven't logged-in since
yesterday, you would expect your mainframe to still retain the information
about your user account - even if the name service is down.
In the case of the name service we propose to stash the arguments
and results of all name service calls. This is not the same as having
a copy of part of the name service name-space; there is no reason
in general to suppose that there are units of name-space that correspond
properly to what is used on a particular machine. In ordinary operation,
with the name service in place we shall use it and only update the
stash, not consult it. If the name service is unavailable no updates
are allowed to the name service data in the stash. (There is also
some attraction to using the stashed data as a name service cache,
but that is not relevant here.)
Files could be stashed. The stashed file data would participate in
any general purpose file caching scheme. The stash would include
all the files necessary to the running of the machine. Hopefully,
most of these are immutable. If mutable stashed files are also used
as a cache, then they must participate in file caching schemes (next
section), with the additional proviso that they cannot just be
"invalidated" - having a file stashed means that it's always available.
Most likely, a stashed file would not be writable when the file service
is unavailable.
File directory entries could be stashed. Or entire file directories.
If stashed directory data is used as a cache, it must participate
in the caching schemes (next section); if not then updates must be
written through to the backing file service.
A stashed directory could be updated even if the backing file service
is unavailable. In this case the updates would be written through to the
file service when the file service is next available. The file service
could resolve any resulting conflicts using a LOCUS(tm) style algorithm.
We have a number of half-baked ideas on what the rules are for stashing
a name service value or a file or directory entry. For example, stash
it thirty days after being used on each of two successive days.
4. Conclusions
---------------
Autonomy is a critical part of the argument for local area distributed
systems. Our present experience is that as distributed systems become
more powerful, autonomy diminishes. Fault tolerant software and hardware
configurations can alleviate, but not eliminate this problem. The
technique of stashing offers some hope. But we do not yet understand
whether stashing is feasible. Stashing replaces failure by out-of-date
answers; we do not know whether this is acceptable to users and systems.
We do not understand the appropriate strategies for building stashes
for particular applications.
Acknowledgement
---------------
These views and ideas (such as they are) arose out of meetings amongst
Mike Burrows, Roger Needham, Michael Schroeder and the present author.
But any deficiencies in the presentation are probably my fault, and
I have not checked that the other participants still hold these views:
they might disagree radically by now.