The Bees and the Ants: The Benefits of
Persistent and Addressable Media (Like the Web)

Sandro Hawke

W3C MIT/LCS
200 Technology Square, Cambridge, MA 02139, USA

sandro@w3.org

DRAFT $Id: paper.html,v 1.16 2002/11/28 01:32:37 sandro Exp $

Abstract

This paper approaches issues in the web's evolving design by
characterizing the web as a shared environment where
information may be placed for others to perceive. This view
allows analogies to other communication systems throughout
history, with a special focus on systems of publication, and this
helps illuminate issues in web architecture.

Keywords

Status

In progress, unclear -- may need massive retargeting. New message:
HTTP presents a glorified file-system interface; why is that good, and
how should we use it?

1. Introduction

Despite the success of the web, or perhaps because of it, the set
of design principles it follows and embodies (its "architecture") has
never been thoroughly documented or agreed upon. This causes ongoing
problems as people attempt to extend the web into new areas and to
continue to implement what they perceive to be its original design.
As in any complex system, construction and maintenance done without a
clear overall picture carries a risk of increased friction and
complexity, decreased usability, and even catastrophic failure.

Each web author adds a little to the system, while each creator of
authoring tools or browsers adds quite a bit more. With larger
contributions, the risks to the overall system (the web) are greater.
Occasionally great controversies arise as architectural features are
added which have potential conflicts. While the furor has died down,
some people are still upset about cookies, javascript, server push,
and many other old issues. In the past two years, the web services
and semantic web activities have emerged among those with loud claims
and presenting significant risks if deployed in an environment where
the web's existing architecture is misunderstood.

In the past year, the W3C's Technical Architecture Group (TAG) has
begun to develop authoritative expertise and documentation to address
these problems, but their work is still incomplete. This paper grew
out of a private technical briefing (with ensuing discussion) on the
relationship between web services and the semantic web; my hope is
that it will contribute to the general understanding of that
relationship and perhaps to the larger discussion of web
architecture.

2. A Shared Environment

2.1. Two Ways to Communicate

As far as we can tell, information travels from one person's mind
to another's by way of the physical world. One person affects the
world, and another perceives it. Sometimes the effect is vibrations
in the air, and another hears it; sometimes the effect is marks on a
piece of paper, which another might see. More recently, with computer
systems, the intermediate effects might be tiny electrical currents or
magnetic domains, but the result is the same: people are
generally able to convey their thoughts, with some degree of accuracy
and at some cost, to other people.

Specific techniques for communication have varied over the
centuries and with circumstances. Today, we have an enormous range of
options, from the most ancient to the latest gadgets. Among all
these, there remains a strong split between techniques which involve
a persistent effect on the environment and those which are by nature
transitory. Shouting out a warning is transitory; erecting a big red
sign is persistent.

This division is not absolute: the erected sign might immediately
collapse, and the shouted warning might echo for several seconds, but
the qualitative difference remains. Persistent messages are issued
with much less idea of and control over who will receive the
information and with less opportunity for feedback and continued
interaction. Balancing these losses, persistent messages can gain a
much larger audience across time and space, and can reach some
audiences at a much lower cost (such as with a posted warning). For
the receiver, getting information from persistent messages offers the
freedom, control, and simplicity of wandering in a bookstore, where
obtaining transitory information requires complex social interactions
to locate informed people and get them to communicate.

The differences between communicating by making persistent changes
to the environment versus just "sending a message" are everywhere. In
designing distributed or multi-threaded computer applications,
developers weigh these same two approaches for arranging how the
computer processes communicate, calling them "message-passing" and
"shared-memory". With message-passing, they imagine a process
constructing a digital message and transmitting it to one or more
receivers. People using computers behave similarly when they send
e-mail. With shared-memory, by contrast, an area of storage is
allocated where one or more processes can place information for others
to later see (and perhaps modify). People using their computers
communicate like this when they author and read web pages.

In fact, this distinction goes back long before human society or
even the human species. For tens of millions of years, bees and ants
have each lived in communities with social structure involving
essential division of labor and communication of information necessary
to survival. In each of them, scouts are tasked with finding food
sources and reporting back; they must inform others where to go and
gather food. This architecture obviously allows much greater
efficiency than having each worker do their own scouting.

The bees and the ants use different communication techniques,
however. Honey bees use direct communication: the scout does a
"dance" in which particular body movements indicate the direction and
distance to the discovered food source. Ants, on the other hand,
modify their environment and leave a persistent message: the scout, on
the way back home after finding some food, activates a scent gland and
drags it on the ground. This creates a coded trail for the gatherers
to use reach the discovery.

There are clearly scaling advantages for the ants: workers
returning from other jobs learn about other food sources immediately,
without needing someone to repeat the directions for them. The
directions can also be much more complex, involving numerous twists
and turns. Of course one cannot leave long-lasting scent trails in
moving air, so the bees, in their different environment, do what they
can.

2.2. Using Shared Memory

Some advantages of using shared memory to communicate:

Reduced coordination
overhead. Someone with information can simply contribute it, without
having to first find an appropriate audience. Someone wanting
information can look for existing information, browsing, skimming,
and perhaps using indexes and other navigation systems, without
needing to participate in social activities.

Increased reach. A persistent change naturally can reach more
people over time, and in many cases the changed part of the
environment can be cheaply replicated, allowing enormous reach over
distance. Live television (which is transitory) also has great reach,
but the Bible probably reached more people earlier and cheaper.

Increased reliability. Our own memories are fallible, as are we
in general. Putting information down in writing allows many people to
consider and work with it.

Addressability. We can navigate in our environments, so if the
information is placed somewhere in the environmental, we can identify
it by its location. "Your assignment [some information] is written on
the blackboard in the front of the room [a location]." Identification
by location is a two edged sword, though, since the environment can be
changed. What happens when someone erases the blackboard, or changes
a few words?

In modern society, shared memory has been most thoroughly
implemented in the medium of print. The publication of books and
journals has been essential to the progress of science and probably to
the progress of humanity in general. The print media have used all
the advantages above to great effect.

In general, we can think of using publications as the same as using
shared memory. The word "publish" comes from "public" and still
means "to make generally known", while shared memory does not need to
be public, but the similarity is still strong. Some publications are
in fact restricted to certain audiences, so perhaps the word is
gaining this meaning and we can speak, in general, of parts of the
shared environment which been modified to carry information as
"publications."

Restricted publications, then, are like fenced off portions of the
shared environment. Some gate controls who can come in and see the
information, and perhaps something records who does so. The basic
techniques for contributing and obtaining information are the same;
they are simply done in an environment with access control.

None of this is novel, of course. Every large organization uses
persistent written records, and the larger organizations have detailed
schemes for identifying and managing each document. The web was
created in such
an environment of written records, which were seen as often
inadequate, in an overall setting of scientific research, where
publication is clearly essential

2.3. Information Servers

In an environment like that of the bees, where storing information
in the environment is not available as a means of communication, one
can sometimes create a virtual environment with similar characteristics.
I don't know how late-arriving gatherer bees learn the way to the
discovery, but I can imagine one possibility: some bee is tasked with
repeating the directions over and over and over. Perhaps that bee
stays in a special place; any gatherer without a job goes to that
special place and looks for someone giving directions.

Telephones offer people a message-passing, direct communication
architecture. One person calls another, initiating a conversation,
and information passes back and forth between them. This works well
for many things, but not so well for others. If I want to go see the
new James Bond movie, but I do not know when or where it is playing,
should I call my friends and ask them? Should I call random people?
No, I call the theater. And do they answer? No, they have a machine
which answers the phone and gives me the information I want. This
machine is like the direction-giving bee in a special location; it is
an information server.

Information servers, then, are things we talk to in a
message-passing world which serve the same purpose as a location for
stored information. When you can't post a sign or write a book, you
need to leave behind a person (or automaton) to carry your message for
you.

The Internet is essentially a message-passing system. TCP provides
a virtual circuit service, but still the information is sent from one
process to another across the net. Higher layers like FTP and HTTP
use information servers to provide a system of persistence, analogous
to physical locations allowing persistent communication.

The web, then, offers an environment where one can find various
persistent messages, just like the physical world, but without the
same costs. As long as the information servers stick to their simple
jobs of providing virtual pieces of paper, the system behaves
gracefully, like a library or ant trails. If they react to being
looked at, or record the fact that you looked at them, or change their
information based on who is looking, or change their information
incoherently, then the virtual environment is revealed and becomes confusing,
unpredictable, and perhaps undesirable.

2.4. Names and Addresses

Linguistic communication involves names for things, because of
course when we're talking about the location of some food, we can not
use either the location itself or the food itself in
our speech; we must use words or phrases which identify or name them.
If I want to say "I like the portabella mushroom sandwiches at The
Lyceum Bar and Grill in Salem, Massachusetts," I need to use several
names for things. One of those names, the noun phrase "The Lyceum Bar
and Grill, in Salem, Massachusetts", identifies an organization I
could call on the phone (or send e-mail, as it turns out). If I put
that sentence about the mushrooms on a web page containing my opinions
about various local restaurants, and I want to tell people about that
web page, I need a name for it as well.

These two kinds of names are especially interesting: names of
things or people we can talk to and names of places where information
can be found. If a thing we might talk to is an information server,
it is in effect both kinds of things. Curiously, the English noun
"address" has both these meanings [Merriam-Webster]:

5a:
a place where a person or organization may be communicated with
7: a location (as in the memory of a computer) where particular
information is stored

I suggest that an e-mail address is the first kind and a web
address is the second kind. While the web address is technically a
communications end-point, the server's masquerade as a virtual page
service, necessary for good web design, requires it to act like a
meaningful and consistent location for information.

3. Old Architectural Issues

@@ terminology: URL/URI, Resource

@@ shopping carts, application servers

@@ chat support

4. Issues Surrounding Machine Use of the Web

[@@ split this off into a different document?]

There are two W3C activities, each with its community of dedicated
participants, aimed at using the web to support machine
interoperability. They see the web as fundamentally successful but
only a small step toward what they want: machines communicating and
working together. This hints at artificial intelligence, but as the
analogy to bees and ants suggests, cooperative, coordinated activity
requires only a well-constructed system, not intelligent
participants.

The two activities are focused on different parts of the problem:
the web services effort involves standardizing the communication
patterns to make them easier to automate, while the semantic web
effort involves standardizing techniques for exchanging knowledge
between participants without prior arrangement.

While there is some tension between these groups as they compete
for mindshare and development resources, they are mostly
complementary, each addressing a different piece of large puzzle.
From time to time, though, they do wander into each other's territory
because they need a partial solution before the other is ready; these
incursions, if not done tactfully and with a real awareness of the
boundaries, can lead to social difficulties.

So knowing the lay of the land is important; we need an overall
approach to interoperability, so we can see where and how the pieces
fit together. In general, the options for interoperability are so
open as to render choosing an approach absurd, but both these
approaches have chosen to leverage the web, so we have a hook: the
publication model narrows the field, making systems interoperate
primarily though storage areas in a shared environment.

3.1 Web Services

[ @@ old; rewrite and expand! ]

The Web Services approach to distributed systems construction attempts
to leverage the technique of publication. Let's publish an interface!
Let's take this addressing scheme and access protocol developed for
the web and use it for our virtual storefronts and service centers
(SOAP). We can of course publish the instructions for using these
virtual stores (WSDL), and we can publish directories of all these
access points. on-line and easily searchable (UDDI).

Is it still a publication? Yes, it's a place in the environment
where information can be put, but is it meant to be persistent and
shared? Perhaps sometimes. When it does -- when the purchase order
you submit has, itself, a URI -- then it's a publication. You can
have the purchaser and seller communicate by each erasing then writing
on the same blackboard, or they can each write on a new area of the
blackboard, keeping the old parts around for reference if needed.

3.2 The Semantic Web

[ @@ old; re-target! ]

The Semantic Web approach to building distributed systems also
attempts to leverage the technique of publication. Here the focus is
on making the publications themselves "understandable" to machines.
But what does it mean for a machine to "understand" something? Where
is the line between data processing and intelligent thought? It
does not matter here; we define "machine understanding" in terms of
observable behavior. A machine understands a publication if it uses
the information carried by the publication to guide its behavior.
For the foreseeable future, it can do this only because it has been
programmed to, not because of some kind of true intelligence.

In other words, a program may well understand the message "Turn on the
sprinkler system just before dawn" because it has been programmed to
recognize exactly that text string and run a sub-program which
computes the time of the sunrise and activates the sprinklers
accordingly. The same behavior could follow from "RUN MORNSPRK.BAT";
the machine's understanding of the two messages is the same. This is
not rocket science; it is pretty simple computer science. Trying to
build the Semantic Web is about trying to get straight how we connect
messages and their meanings in a way which works not just on your own
machine (which is pretty easy), but which scales across the globe.

[ @@ old; explain better! ]

One short pre-Semantic Web analysis of this problem [AIMA] suggests one
of the fundamental problems here is coming up with names for newly
discovered things (such as food sources!) and communicating their
definition (in a language the computers already understand.) A
publishing system (such as the web) gives us handy solutions to both
these problems: we can make up names which are unique within a
particular publication, and then global name things by naming the
publication (which we already know how to do) and then providing the
publication-specific name. Definitions can also be provided in the
same publication (or perhaps other ones), addressing the second half
of this problem. This architecture of referring to definitions
instead of transmitting them allows compact communication without a
presumption of state.

@@ other issues?

5. Conclusions

@@ the web is about web pages, and we push that definition at our
peril. web services and the semantic web fit together nicely in this
model; it's a good way for machines to communicate, too.

Acknowledgments

The ideas in this paper grew out of conversations with Tim
Berners-Lee and Dan Connolly. Without them (even if the web still
somehow existed!) this work would not have been possible

This work has been supported by the DARPA/DAML project under
MIT/AFRL cooperative agreement number F30602- 00-2-0593.

Despite the author's affiliation with the W3C, this work is
obviously not on the W3C recommendation track. It is not the product
of a W3C working group or interest group and should in no way be
construed as reflecting the position of the W3C or its members.