The programs that actually perform Slony-I replication are the
slon daemons.

You need to run one slon instance for each node
in a Slony-I cluster, whether you consider that node a
"master" or a "slave". On Windows™ when
running as a service things are slightly different. One slon service
is installed, and a seperate configuration file registered for each
node to be serviced by that machine. The main service then manages the
individual slons itself. Since a MOVE SET or
FAILOVER can switch the roles of nodes, slon needs
to be able to function for both providers and subscribers. It is not
essential that these daemons run on any particular host, but there are
some principles worth considering:

Each slon needs to be able
to communicate quickly with the database whose "node
controller" it is. Therefore, if a Slony-I cluster runs
across some form of Wide Area Network, each slon process should run on
or nearby the databases each is controlling. If you break this rule,
no particular disaster should ensue, but the added latency introduced
to monitoring events on the slon's "own node" will cause
it to replicate in a somewhat less timely
manner.

The very fastest results would be achieved by having
each slon run on the database server that
it is servicing. If it runs somewhere within a fast local network,
performance will not be noticeably degraded.

It is an attractive idea to run many of the
slon processes for a cluster on one
machine, as this makes it easy to monitor them both in terms of log
files and process tables from one location. This also eliminates the
need to login to several hosts in order to look at log files or to
restart slon instances.

Warning

Do not run a slon that is
responsible to service a particular node across a WAN link if at all
possible. Any problems with that connection can kill the connection
whilst leaving "zombied" database connections on the node
that (typically) will not die off for around two hours. This prevents
starting up another slon, as described in the FAQ under multiple slon
connections.

Historically, slon processes have
been fairly fragile, dying if they encounter just about any
significant error. This behaviour mandated running some form of
"watchdog" which would watch to make sure that if one
slon fell over, it would be replaced by
another.

There are two "watchdog" scripts currently
available in the Slony-I source tree:

tools/altperl/slon_watchdog -
an "early" version that basically wraps a loop around the
invocation of slon, restarting any time it falls
over

tools/altperl/slon_watchdog2
- a somewhat more intelligent version that periodically polls the
database, checking to see if a SYNC has taken place
recently. We have had VPN connections that occasionally fall over
without signalling the application, so that the slon
stops working, but doesn't actually die; this polling addresses that
issue.

The slon_watchdog2 script is probably
usually the preferable thing to run. It was at
one point not preferable to run it whilst subscribing a very large
replication set where it is expected to take many hours to do the
COPY SET (the main event that processes a
SUBSCRIBE SET request). The problem that came up
in that case was that it figured that since it hasn't done a
SYNC in 2 hours, something was broken requiring
restarting slon, thereby restarting the COPY SET
event. More recently, the script has been changed to detect
COPY SET in progress.

In Slony-I version 1.2, the structure of the
slon has been revised fairly substantially
to make it much less fragile. The main process should only die off if
you expressly signal it asking it to be killed.

A new approach is available in the Section 21.4 script which uses
slon configuration files and which may be
invoked as part of your system startup process.