Defining the nodes indicated the shape of the cluster of
database servers; it is now time to determine what data is to be
copied between them. The groups of data that are copied are defined
as "replication sets."

A replication set consists of the following:

Keys on tables that are to be replicated that have no
suitable primary key

Slony-Ineeds to have a primary key or
candidate thereof on each table that is replicated. PK values are
used as the primary identifier for each tuple that is modified in the
source system. Note that they can be composite keys composed of
multiple NOT NULL columns; they don't need to consist of single
fields. There are three ways that you can get Slony-I to use a
primary key:

If the table has a formally identified primary key,
SLONIK SET ADD TABLE can be used without any need to
reference the primary key. Slony-I can automatically pick up that
there is a primary key, and use it.

If the table hasn't got a primary key, but has some
candidate primary key, that is, some index on a
combination of fields that is both UNIQUE and NOT NULL, then you can
specify that key, as shown in the following example.

However, once you have come this far, there is little reason not
to just declare some suitable index to be a primary key, which requires
that the columns involved are NOT NULL, and which will establish a unique
index. Here is an example of this:

If your application is not somehow referencing the index by name,
the this should not lose you anything, and it gives you the clear design
benefit that a primary key has been declared for the table.

Notice that while you need to specify the namespace for the
table, you must not specify the namespace for the
key, as it infers the namespace from the table.

If the table hasn't even got a candidate primary key,
you might ask Slony-I to provide one using
SLONIK TABLE ADD KEY.

Warning

SLONIK TABLE ADD KEY was always considered
a "kludge", at best, and as of version 2.0, it is
considered such a misfeature that it is being removed.

It is not terribly important whether you pick a
"true" primary key or a mere "candidate primary
key;" it is, however, strongly recommended that you have one of
those instead of having Slony-I populate the PK column for you. If
you don't have a suitable primary key, that means that the table
hasn't got any mechanism, from your application's standpoint, for
keeping values unique. Slony-I may, therefore, introduce a new
failure mode for your application, and this also implies that you had
a way to enter confusing data into the database.

It will be vital to group tables together into a single set if
those tables are related via foreign key constraints. If tables that
are thus related are not replicated together,
you'll find yourself in trouble if you switch the "master
provider" from one node to another, and discover that the new
"master" can't be updated properly because it is missing
the contents of dependent tables.

There are also several reasons why you might
not want to have all of the tables in one
replication set:

The initial COPY_SET event for a
large set leads to a long running
transaction on the provider node. The FAQ outlines a number of problems that result from long
running transactions that will injure system performance.

If you can split such a large set into several smaller pieces,
that will shorten the length of each of the transactions, lessening
the degree of the "injury" to performance.

Another issue comes up particularly frequently when replicating
across a WAN; sometimes the network connection is a little bit
unstable, such that there is a risk that a connection held open for
several hours will lead to CONNECTION TIMEOUT. If
that happens when 95% done copying a 50-table replication set
consisting of 250GB of data, that could ruin your whole day. If the
tables were, instead, associated with separate replication sets, that
failure at the 95% point might only interrupt, temporarily, the
copying of one of those tables.

These "negative effects" tend to emerge when the
database being subscribed to is many gigabytes in size and where it
takes many hours or even days for the subscriber to complete the
initial data copy. For relatively small databases, this shouldn't be
a material factor.

Any time you invoke SLONIK EXECUTE SCRIPT,
this requests a lock on every single table in the
replication set, first, on the origin node, and then, as the event
propagates to other nodes, to each subscriber node.

There have been reports "in the field" of this
leading to deadlocks such that the SLONIK EXECUTE SCRIPT
request had to be submitted many times in order for it to actually
complete successfully.

The more tables you have in a set, the more tables need to be
locked, and the greater the chances of deadlocks.

By the same token, if a particular DDL script only needs to
affect a couple of tables, you might use SLONIK SET MOVE
TABLE to move them temporarily to a new
replication set. By diminishing the number of locks needed, this
should ease the ability to get the DDL change into place.

There is a further discussion of
locking which outlines where Slony-I requires locks likely to
intrude on your applications.

Each time a SYNC is processed, values are recorded for
all of the sequences in the set. If there are a
lot of sequences, this can cause sl_seqlog to grow rather
large.

This points to an important difference between tables and
sequences: if you add additional tables that do not see much/any
activity, this does not add any material load to the work being done
by replication. For a replicated sequence, values must
regularly be propagated to subscribers. Consider
the effects:

A replicated table that is never updated does not
introduce much work to the system.

If it is not updated, the trigger on the table on the origin
never fires, and no entries are added to sl_log_1/sl_log_2. The table never appears in any of the
further replication queries (e.g. in the
FETCH 100 FROM LOG queries used to find
replicatable data) as they only look for tables for which there are
entries in sl_log_1/sl_log_2.

In contrast, a fixed amount of work is introduced to
each SYNC by each sequence that is replicated.

Replicate 300 sequence and 300 rows need to be added to
sl_seqlog on a regular basis, at least, thru until the 2.0 branch,
where updates are only applied when the value of a given sequence is
seen to change.

It is more than likely that if the value of a particular
sequence hasn't changed since it was last checked, perhaps the same
value need not be stored over and over; some thought needs to go into
how to do that safely.

Bug #1226 indicates an error condition that can come up if
you have a replication set that consists solely of sequences.

This is documented more in the FAQ
here; the long and short is that having a replication set
consisting only of sequences is not a particularly good
idea.