The worldroot server manages a local version of the top two levels
of the distributed KOE namespace. A single worldroot server is
generally shared between all KSSs at a local site. We will call a
collection of KSSs that share a worldroot server a ``cluster.'' The
worldroots of different clusters (called worldroot peers) exchange
information about updates so they all share the same view of the
world. Updates propagate asynchronously so clusters may temporarily
have different views of the world. This is intentional, since
synchronous updates wouldn't scale to a KOE with zillions of clusters.

For testing purposes, it is often useful to run a cluster in
isolation (i.e. not linked to other clusters). By default, the
worldroot server runs this way. To operate with a shared namespace,
the worldroot must be configured with the names of peer worldroot
servers.

By default, worldroot checks for the presence of the file
var/worldroot.sbh (see below), which normally
exists only when a worldroot server is running. If the file is found,
the worldroot will not start. Occasionally, a worldroot server may
exit without removing this file; in this situation the -f option
skips the check.

-h

Print a help message and exit. (The same response is triggered
by specifying an invalid command line option.)

-nname

Specify an alternative name for the worldroot server, which is
also interpreted as the cluster's name. The name is used in two
places: as the ILU server name, which affects the SBHs of objects
created by the worldroot server, and as the peer name in the
replication process, which appears in the first column of
var/wr-peers.tab (see below). The default
orldroot server name is the full host name of the machine it runs on.

-pport

Specify an alternative TCP port for the ILU server. This is only
needed to allow two worldroot servers to run on the same machine,
e.g. for testing purposes. The default port is 7438.

-q

Operate in quiet mode. In this mode, only error messages are
printed. By default, some informative messages and warnings are also
printed, especially during startup. Note: -v and -q are
mutually exclusive.

-s

Run a "sweep" operation before entering full service mode. See
below for a full description.

-v

Operate in verbose mode. In this mode, the server prints messages
about many events. By repeating the option, the server can be made
even mode verbose. Note: -v and -q are mutually
exclusive.

The worldroot server uses the following files. All filenames are
relative to the $KOSROOT directory.

config/access.conf

Directives in this file determine the hosts that are acceptable as
peers (<Limit WORLDROOTPEER>) and the hosts that can shut the
worldroot server down (<Limit WORLDROOTSHUTDOWN>). See the
documentation on security issues for
syntax and implications.

var/worldroot.sbh

On startup, the worldroot server writes the SBH of its root
context to this file. The file is removed on exit, to signal to
starting service stations that no worldroot server is running.
The worldroot server refuses to run if this file is present on
startup, unless the -f option is given. The SBH does not
change between invocations, but it does depend on the values specified
for the -p and -n options, and on the fingerprints of
the object types defined in the WorldAPI ISL file.

var/wr-replicator.sbh

On startup, the worldroot server writes the SBH of its worldroot
replicator object to this file. The SBH is used to configure
worldroot peers; it must be manually copied into the peers table (see
below) of the peer. The SBH does not change between invocations, but
it does depend on the values specified for the -p and -n
options, and on the fingerprints of the object types defined in the
WorldAPI ISL file.

var/wr-peers.tab

Table of peer replicators. On startup, the worldroot server reads
this file to find the names and SBHs of its worldroot peers. It also
adds its own name and SBH. If no peers table is found, the worldroot
server starts off in stand-alone mode. If another worldroot servers
connects and the connection is allowed by the access.conf
configuration, it will be added to the peers table.

The file format is as follows: each line contains two
whitespace-delimited fields. The first field is the name of a peer,
the second is the last known SBH of its worldroot replicator. Blank
lines and lines whose first non-blank character is a hash sign
(comments) are ignored. Other lines that do not contain exactly 2
files are flagged as errors and ignored.

It is not necessary to specify all known worldroot servers as
peers. It is sufficient that the peer-to-peer relationships define a
bidirectional connected graph. For redundancy, it makes sense to have
at least two peers so that a server is not isolated when its only peer
goes down. Cycles in the graph are not a problem; updates received
more than once are ignored and are not propagated further.

var/wr-currentdb*

Database containing the current state of the namespace contexts
maintained by this server. This database is used for persistence in
the face of worldroot server crashes, and is consulted before recovery
information from peer servers is gathered. The filename extension(s)
(if any) depends on the database module used; depending on which
module is provided by the Python configuration, it can use either dbm,
gdbm, or the BSD hash table library.

A cluster's worldroot needs to be running when a new KSS is
started. Normally the koeboot utility
starts the worldroot before starting any service stations.

The worldroot server only manages the top two hierarchical levels
in the KOE namespace: the root context and the contexts immediately
below it (for an explanation of the concept of a namespace context,
see the nstools module). In the
current implementation, the root context cannot be modified. It
contains four standard contexts named "kos", "tools", "types" and
"replicators", which are used as follows:

kos

This context contains entries for each KSS in the KOE. (As the
KOE grows, the namespace will start using a separate context for each
cluster and group clusters together in a hierarchy for scalability
and ease of management.)

tools

This context contains entries for KOE-wide tools like the visualizer. (Again, for scaling purposes, this will
be replaced by a hierarchy.)

types

Currently not used. It is planned to have a replicated type
registry for ILU ISL files here.

replicators

The collection of all known worldroot servers, for purposes
of the sweep operation described below (see the -s option).

It is possible to restart the worldroot server after service
stations have started, e.g. if the worldroot crashed. The namespace
contexts run by the worldroot will be inaccessible while the server is
being restarted. As a result, KP operations like migrate() will raise
exceptions (because the "kos" context is unavailable). The service
stations, however, will survive a temporary worldroot service
interruption however.

Replicated operation is triggered by the presence of one or more
peer servers in var/wr-peers.tab (described above). In
replicated operation, any updates in the contexts managed by a
worldroot server are asynchronously propagated to all its currently
reachable peers. Updates received from one peer are propagated to all
other peers.

Let's consider a simple example with two clusters, A and B.
When a program in cluster A does a lookup of "kos/monty" relative to
the worldroot context, its request will be processed by the worldroot
server for cluster A; likewise in cluster B. The lookups are
completely independent; if cluster B is unreachable, it doesn't
affect the lookup in cluster A at all. Both lookups, however, will
come up with an SBH for the same object. The object is not
replicated; it might live in either cluster, or in a third cluster.
The object must be alive and reachable in order to use it, but not to
look it up.

Now let's look at an update made to a context managed by a
worldroot server. Assume that a new KSS "repoman" is added to cluster
A. When it is started, the KSS makes a call to its cluster's
worldroot server to create a binding for "kos/repoman". When the
worldroot server replies, it has added the binding to its local copy
of the "kos" context and stored it in its current state database
(var/wr-currentdb*, see below). The worldroot
server propagates this
update to all its peers, who propagate it too all their peers,
and so on, until it has spread to the entire KOE. During the time it
takes for the update to propagate, it is possible that a lookup of
"kos/repoman" will fail on some clusters and succeed on others. This
inconsistency is temporary: As long as all peers remain connected,
they will receive the update eventually.

When a worldroot server is down, it can miss updates. Therefore,
when a worldroot server starts up, it first reconstructs its
state from its local database (var/wr-currentdb*,
see above) and then
connects to its peers and asks them to retransmit the updates that it
has missed. Updates received during recovery are not propagated to
other peers. It is therefore possible, though not very likely, that
some updates are not propagated across the entire KOE. The KOE-wide sweep
operation, described in the next section, can correct the resulting
inconsistency.

A KOE-wide sweep operation is initiated by starting any worldroot
server with the -s option. During a sweep, updates are
collected from all known worldroot servers, processed, and then
distributed back to all participating servers. After the sweep all
participants are synchronized in their view of the world. The sweep
operation does not just operate on the collection of peers found in
var/wr-peers.tab; instead, it accesses
all known worldroot servers. This collection is found in the
"replicators" context of the KOE namespace. The sweep operation can
only succeed when all participating servers are up. If there is a
failure, the sweep operation is aborted and the initiating server
transitions into normal service mode.

Sweeps can be expensive. Normally sweeps are not necessary because
the recovery information exchanged with peers at server startup time
(including after crashes) should bring a server's world view up to
date. Pathological situations are possible, however, when updates are
not propagated due to extended down time of several servers or network
connections. Sweeps take care of recovery after such failures.