Replication and 'nosync' Initializations in SQL Server 2000

Introduction

The normal method of initialization for a replication publication is to run the snapshot agent and then
use the distribution (snapshot/transactional) or merge (merge) agents to distribute the snapshot files. Briefly, when the snapshot agent runs, it prepares a series of text files in the distribution working folder on the publisher/distributor. These files contain everything required to set up the subscriber.
There are many of these files, and what exactly is in them depends on the type
of replication you are doing, but they may include table creation scripts,
stored procedures, triggers and the actual table data. The files are transferred to the subscriber by the distribution/merge agent and applied there. However this is not always ideal,
simply because the subscriber may already have the data. After all, if the data
occupies a lot of space, why would you choose to create snapshot files of all of
it on the publisher and send them over your LAN apply the inserts at the
subscriber, just to arrive at the same point you started from? So, if you want
to avoid the creation, transmission and application of the snapshot, the method
is known as a 'nosync' initialization and is examined in this article, along
with the ramifications of using this method.

Implementation

I'm assuming in this article that we have identical data at the publisher and
subscriber. How we arrive at this situation is not covered - it could be DTS,
backup/restore or just be that the business works that way.

'Nosync' initialization derives its name from the @sync_type = 'none' parameter value of
sp_addsubscription. In theory it applies to snapshot, transactional and merge
replication but it only really makes logical sense in the case of the latter two,
so we'll leave out snapshot replication from now on. In this article we'll be looking at transactional, although many of the points apply equally well to merge.

Consider if we are replicating a single table - tCompany. The table and
its data already exists on the publisher and subscriber.

We create the publication as per usual, and then when creating
the subscription, we state that the subscriber already has the data:

So far it seems straightforward to set up. However, if you do an insert on the
publisher and run the distribution agent, the following error is received:

Remember that in transactional replication, an insert is read by the log reader
agent and placed in the distribution database. The actual command exists in the
MSrepl_commands table in binary, but can be converted to readable format using
sp_browsereplcmds. So, for out insert, we see that the command propagated to
the subscriber is not a simple insert, but a call to an insert stored procedure:
sp_MSins_tCompany. This means that the normal initialization process
not only propagates the schema, but replication stored procedures are also
created at the subscriber, and in our case are missing.

To create these stored procedures, SP1 onwards provides a system stored
procedure - sp_scriptpublicationcustomprocs. If we run
sp_scriptpublicationcustomprocs 'publicationname' at the publisher, the results
(in text format) are 3 stored procedure creation scripts. These are then run on the subscriber
and subsequently running the distribution agent will remove the error and
propagate the insert normally.

Adding Additional Articles

The setup now is slightly different to a normal one. The subscription is
labelled as a nosync one and if we want to add a new article to the publication the same process
as above is required.
Running the snapshot agent in this case is no use and results in the
message:

A snapshot was not generated because no subscriptions needed initialization.

Note however that as a poster on
the Microsoft newsgroup (Hassan) pointed out to me this is actually possible
by script ie using sp_addsubscription with @sync_type = automatic it is
possible to add a new article and have the snapshot agent work as per usual. I
wouldn't recommend doing this as it'll lead to an unmaintainable setup, but in
some rare circumstances it could be useful.

Considerations for Identity Columns in the
Table Schema

Mostly the methodology above suffices, but what is the table
schema was slightly different:

In this case the ID column has the identity attribute. If the schema at the
subscriber is identical to that at the publisher, then there will be other
problems.

An insert or delete on the publisher propagates normally. However, an update to
CompanyName will result in the error:

This seems peculiar, as we are not updating this column. The explanation comes
when we look at the code in sp_MSupd_tCompany. The first part deals with an
update to the ID column. It can never be executed, but nevertheless it causes the
compilation to fail.
So, this means that either the stored procedure needs to be edited (and the
first section removed), or you must drop the identity attribute off the column on the subscriber.
This is not a bug as such - normal initialization would generate a different
schema on the subscriber and remove the identity attribute itself..

Concerns about non-distributed commands

Another
issue to consider is what happens to the changes made on the publisher while
we are doing our no-sync initialization - will they cause us a problem? There
are 2 main scenarios:

The
first case involves the issue of important commands missing from the
distribution database. Imagine we do the backup of an existing subscriber,
restore the database on the new subscriber and then initialize (nosync).
However, while doing this process, there has been an insert on the publisher.
The insert command has been synchronized with the existing subscriber and
subsequently removed from MSrepl_commands by the distribution cleanup agent.
The 2 databases are now out of sync and a publisher update of the newly added
row, when synchronized with the second subscriber, will result in the
following message:

The
row was not found at the Subscriber when applying the replicated command.

The
other case involves additional, unnecessary commands existing in the
distribution database compared to the subscriber.

As
before we do a backup of an existing subscriber and restore it on another
server. While doing the database transfer, there has been an update and delete
on the publisher. So, in the MSrepl_commands table on the distribution
database there are 2 commands, and we wonder if they will be applied to our
new subscriber. The MSreplication_subscriptions table on the subscriber holds
a transaction timestamp which is used to determine the commands selected. The
value is the last transaction in the MSrepl_commands table for this
publication at the time the snapshot was applied. In other words
synchronization of the new subscriber will pick up only commands committed
after the initialization and will omit these 2 commands.

Although
it's true that by using the info above, you might see that it is possible to
hack the system table to avoid these problems (and make sure that the cleanup
agent is disabled), the simplest recommendation is to ensure that there have
been no changes made to the publisher since the copy of the data was taken.

Conclusions

Nosync initializations are a
useful way of avoiding the creation and propagation of the initial snapshot.
They can allow you to setup your replication environment without imposing an
unnecessary load on your network. The caveat is that the data on the publisher
and the subscriber needs to be the same. Additional consideration needs to be
taken to create the relevant stored procedures, to take account of identity
columns, to consider any additional articles later added to the publication
and to deal with commands executed during the setup time.