Implementing Bi-Directional Transactional Replication

Bi-Directional replication is copying data from one server to another where:

• either server can be the source or destination, and• the data being copied can be in the same tables on both sides.

For example, I want to replicate a database between Server A and Server B, so that:

• the data which originates on Server A will be replicated to Server B, and • the data which originates on Server B will be replicated to Server A.

Essentially, I want writeable copies of the data in two different databases - which might be on the same or different servers - and I want the data to be consistent ( i.e. - the same on both sides).

Most implementations on which I consult have a “real time” requirement, i.e. at any one time the data in both database is identical. Though not really achievable using bi-directional replication, this may be achieved by doing some sort of split write (open a transaction, write on server A, write on Server B, commit the transaction, and then return to the application so it can move on and do different work).

Of course, using a split write has significant application latency considerations, and the link between the two servers must be very stable. Application latency refers to how long it takes for the application to commit a transaction to the database(s). The two servers also must be very close to each other; the greater the distance between the two servers, the greater the application latency. Basically, if you go down the split write route, you do get identical point in time synchronization of the two databases, but the cost is transferred to application latency where transactions which originally took 10 ms, may now take 100 ms or more.

Normally, I have some education process where I have the client reduce the real time expectations to something of the order of several seconds or even minutes.The most common usage is for Load Balancing or Fault Tolerance/Disaster Recovery. Load balancing normally means the load is balanced between one or more servers - as one server cannot handle the load, the load spills over to servers which have a lighter load. It also can mean or may entail that clients connecting to one server will continue to connect to that server for the lifetime of their sessions – sometimes called “sticky session” or persistence. Fault Tolerance means the ability of a server to with stand a localized failure and continue to serve clients with minimal disruption of services and minimal data loss. Disaster recovery means the ability to withstand regional failures and continue business operations in another location. I should point out that bi-directional transactional replication is a poor fit for all of these cases since:

1) Replication does not offer predictable latency, so at one instant my latency can be several seconds, while a half day later it can be several minutes, and sometimes several hours. Using replication for disaster recovery means that your exposure to data loss can is unpredictable. Using replication for load balancing is also a poor choice as a client may do some work on one node, and then return seconds later and hit the other node expecting to have the data that was just entered in the first node. Due to replication latency - this data, however, has not yet made its way to the second node. 2) There is no automatic client redirection as found in mirroring or clustering. 3) There is no load balancing or workload distribution; in our case, as the workload on Server A increases, clients will not be automatically redirected to Server B - as in the case of NLB.

With the above points being noted the majority of my bi-directional transactional replication implementations have been meant to address both fault tolerance/disaster recovery and load balancing goals and they do work well.

The goal of some implementations of bi-directional transactional replication is to reduce contention on one server by scaling out the load over a multitude of servers.

This can be a good fit in some cases. For example, with read intensive workloads, you will find that the read activity can be halved by redirecting half of your clients to the second node. In other words, instead of having 100 clients banging away at Server A, you can distribute the load by having 50 of them bang away at Server B (the remaining 50 clients will bang away at Server A).

However, if you are trying to reduce write contention, the dynamics are different. If I have 1000 inserts per second on Server A coming from 100 clients, and I decide to load balance by having 50 clients work on Server A and 50 on Server B, I will still end up with 1000 inserts per second on Server A – 500 from the 50 clients on Server A, and 500 from the replication process carrying over the 500 inserts per second from the clients banging away at Server B. The end result is the same level of writes and twice the licensing/hardware and maintenance costs.

But the transactional footprint can be quite different. If the transactional context of the client operations is such that it causes contention due to the transaction length or footprint, you may find that replicating the transactions to the other node will result in far less resource consumption than the transactional footprint on the originating server.

For instance, consider a case when an insert statement which occurs on the publisher is the result of a 5 table join, as follows:Insert into tableName

Of course the above would be wrapped in a replication stored procedure, I have stripped everything away except the essential insert for clarity.

This replicated insert statement will just be the final state of the row and will not have to reference the 5 table join on the subscriber. The consequence of this is a much smaller resource footprint on the subscriber, and the locks will be held for a shorter time period.

Also, consider this transaction occurring on the publisher:

Begin TranExec Proc1 -- inserts 5 rows into table1 as the result of some lengthy calculationExec Proc2 -- inserts 5 rows into table2 as the result of some lengthy calculationExec Proc3 -- inserts 5 rows into table3 as the result of some lengthy calculationExec Proc4 -- inserts 5 rows into table4 as the result of some lengthy calculation...Exec Proc20 -- inserts 5 rows into table20 as the result of some lengthy calculationCommit tran

As you can imagine, the transaction footprint on the publisher would be far more severe than the simple 100 row insert which would occur on the subscriber - even though the transaction would be held for the entire 100 row insert on both the publisher and the subscriber. On the subscriber, the transaction is merely 100 insert statements – no lookups, no 5 table joins, etc. So a transaction that takes several seconds on the publisher will

• take far less time on the subscriber, • have a minimal IO hit, • and cause significantly less blocking/deadlocking there.

One last item I wish to cover before diving into an example is to explain why bi-directional transactional replication is normally the better choice amidst the other contenders listed below.

Merge Replication

Merge replication can be used, but is generally a poor fit since the tracking triggers add considerable application latency, and the replication latency of the synchronization tends to be much higher than the alternatives. There is also the added complexity of a merge replication solution and the guid column requirement.

Peer to Peer Replication

Peer to Peer replication can be used, but it is much slower than bi-directional transactional replication and has a much smaller feature set. For example, in Peer to Peer replication I cannot use custom stored procedures, there is no row or column filtering, or automatic identity range management. The Enterprise Edition requirement normally makes Peer to Peer replication a costly choice.

Here is an illustration of just how slowly Peer to Peer replication will perform: to replicate 10,000 inserts with a CommitBatchSize of 100 and CommitBatchThreshold of 1000 (the defaults) takes an average of 6.325 s over 10 runs using bi-directional transactional replication, and an average of 12.939 s over 10 runs using peer to peer replication. Keep in mind that we are looking at an edge case and YMMV, but, in general, bi-directional transactional replication is significantly faster than Peer to Peer. On the downside, you cannot practically scale bi-directional replication to more than 2 nodes – it can be done, but performance degrades significantly the more nodes you add to your topology. Just an aside - Peer to Peer can’t be scaled beyond 10 nodes easily.

Updatable Subscriptions

Updatable Subscriptions are another option - however; they are intended for topologies where the majority of the transactions originate on the publisher. Immediate updating is a split write, and performance degrades as a function of the link speed/bandwidth for subscriber side originating transactions. Should the link go down between the publisher and subscriber, the published tables will be read-only on the subscriber. Queued is an option, but again - it is tuned for the majority of the DML originating on the Publisher, not the Subscriber. Also keep in mind that Updatable Subscriptions are deprecated.

Prepwork

Setting up bi-directional transactional replication requires two things:1) All identity columns must be set a. to NOT FOR REPLICATION, andb. odd on one side and even on the other c. the increment should be adjusted to 2 d. and reseed to hand out an odd value on the odd side and an even on the other.2) All triggers and constraints should be set to NOT FOR REPLICATION. You don’t want the trigger to fire twice as it has already done its work on the publisher (auditing may be an exception here).

This can involve considerable work. Marking identity column NOT FOR REPLICATION is easy using the proc sp_identitycolumnforreplication. Changing the current identity value to odd or even is again simple using DBCC CheckIdent. However, changing the increment involves recreating the table.

Normally, I would do as follows:

• Script all triggers, constraints, indexes• Script out the tables and recreate the new ones with a _temp extension• Move the table data from the original table to the new table using bcp.• Drop the original table and rename the new table the old name, ie OriginalTable data is copied to OriginalTable_temp, OriginalTable is dropped and OriginalTable_Temp is renamed as OriginalTable• Apply the triggers, constraints and indexes.

You will likely need to figure out which are your largest tables, and then group them into batches you can do simultaneously. For databases with large numbers or tables, doing this sequentially can be very time consuming and will involve considerable downtime. You may want to group your tables by size and then run multiple bcp scripts at one time - each dedicated to different tables to minimize your downtime.

Doing this on a live production system, however, requires a different approach:• Create the first publication/subscription pair (Server A and Server B in our case). Ensure you use the no-sync subscription option.• Ensure that the distribution agent is disabled. • Backup the database and restore it on the Subscriber (Server B).

Now you have all sorts of time to get your subscriber setup as the transactions occurring on the Publisher (Server A) will pool in its distribution database.

Set all identity columns on Server B to have an island of ranges - several orders of magnitude above what is being currently assigned on the original source. For example, on Server A you may be currently assigning identity ranges in the millions. Then, set your range on the subscriber to 100 million. This gives you a lot of wiggle room before primary key collisions occur when the same identity range value is assigned on both sides. Naturally, you want some alerting job that will send out email when you are exhausting a range, or when one range is likely to collide with another.

Setup

Setting up bi-directional transactional replication is simple. Once you have prepped your tables, all you need to do is ensure that the @loopback_detection parameter of sp_addsubscription is set to “true”.

Note, that you may need to wait a couple of seconds for the above two queries for return 20 rows. Rows originating from BIDINodeA will have a value for the OriginatingDB column of BIDINodeA, and rows originating from BIDINodeB will have a value for the OriginatingDB column of BIDINodeB.

Caveats

There are some caveats with Bi-Directional Transactional Replication.

Non-Convergence

The first is that should a failure event occur, which might cause a lack of data consistency (i.e. different data on both sides), the distribution agent will fail/stop until the administrator can step in and clear the event which is causing the failure. This is normally a good thing, as it prevents your databases from losing consistency (i.e. having different data). This event will likely be a conflict – an attempt to:

• insert a primary key value on a node which causes a primary key violation; • update a row which is deleted; • delete a row which is already deleted.

Most frequently the goal in a bi-directional replication topology is to present two data sets to your clients which are within seconds of synchronization with each other.

When a data consistency error occurs and the agents shut down the two data sets will get progressively more and more out of sync; in other words they will become disconnected islands of data as each set of clients is entering/modifying data on each island. An insert occurring on Node A will stay in Node A. If the client is redirected to node B and does not see their data – chances are that they will try to re-enter their data this time on Node B – which when the synchronization process occurs, will cause a conflict.

It becomes critical to clear any data consistency errors as quickly as possible to prevent further conflicts. For some implementations this can be difficult. Some implementation it is very simple. I normally have email alerts going off to the team when these events occur. Some implementations never raise alerts and it makes me check them to make sure they are working. Others are very noisy. For example one company feels the need to replicate the ASP.NET state tables. This results in a lot of noise, all of it false positives – well so far - all of it false positives.

Another factor to consider is there is apparently a bug in bi-directional transactional replication which will raise “row not found” error messages. I have only heard about is anecdotally.

Quite often it is more important to keep the agents running, and then run a validation to determine where the error is, as opposed to let the agents stop, investigate and clear the error. The reason why it is more important to keep the agents running is that the longer the agents are not running for, the greater the chances of future conflicts occurring is.

The best way is to skip the error(s) using the Continue On Data Consistency Error Profile. However, this means that if errors are skipped there is a chance your database will get progressively more and more out of sync or different.

It then becomes critical to run publication validations to find out what tables are different.

I normally schedule validations at least daily and run the validation twice to see if the validation error occurs on the same tables for both validation runs. My reasoning is the first time a validation error occurs, it could be due to in-flight data. If the table shows up with a validation error twice in a row, it likely is a true out of sync condition as opposed to inflight data.

Non-Convergence and swapping

Another factor to consider is that if I update the same column on the same row on both sides of a bi-directional replication topology, my values will swap.

Consider this update.

Update TableName1 set Col1=’NodeA’ where PK =1 -- on NodeAUpdate TableName1 set Col1=’NodeB where PK =1 -- on NodeB

Before my synchronization the state of the row will look like this:

Select * from TableName1 where PK=1 -- on NodeA1, NodeASelect * from TableName1 where PK=1 -- on Nodeb1, NodeB

After synchronization it will look like this:

Select * from TableName1 where PK=1 -- on NodeA1, NodeBSelect * from TableName1 where PK=1 -- on Nodeb1, NodeA

The values have swapped. Granted the chances of such an update happening are slim, but it can happen. Ideally with your data perfectly partitioned only one client will be active on one node at one time and will only work with data he/she owns so the chances of swapping are slim.

DML Changes

Bi-Directional Transactional Replication can replicate DML changes. However there is a caveat. A change made on Node A will be replicated to Node B and the replication stored procedures Node A uses to apply commands to Node B will be updated for the schema change on Node B. But the replication stored procedures that Node B uses to replicate changes occurring on Node B to Node A will not reflect this change.

It becomes necessary to regenerate these stored procedures. Either through a database trigger or a SQL Server job than runs at periodic intervals. I use • sp_scriptinsproc• sp_scriptupdproc

For this. I don’t need to worry about the delete procs as they only need to account for the PK.

As you can see bi-directional transactional replication can be a good fit for some scenarios.Before you implement a bi-directional. Ensure you test with a representative workload before deploying into production.

Comment Notification

Comments

I presently have merge replication setup and it is running fine. I have a Replication Alert set up for Merge Conflicts/sec counter. There was a conflict recently which I resolved using the Interactive Conflict resolver. Now the Merge conflicts table is empty both on publisher and subscriber. But the Alert still fires as if there are conflits which is not the case. I disbaled the alert for now. How do I reset the counter/Alert so that is stops sending the false alerts?

Thanks for this article, it's a great start for us who like to script rather than use GUI's!

I've been playing around with your bi-directional transactional replication example, and it works great. Now, I'm trying to modify the script to replicate the db on two different servers, but can't seem to get it right. Can you explain how to do it?

I have a problem while setting up a BiDirectional replication, thought I cannot understand if it's bug or feature.

We are using @sync_type = N'initialize with backup' because our databases are quite big and it's the fastest way to copy the database to the subscriber. If I'm adding @loopback_detection = 'true' to the sp_addsubscription procedure all transactions arrive to Distributor but NONE of them are being sent to the subscriber. No errors occur, dist agent states "No replicated transactions are available". If I change @sync_type to be 'replication support only', all is working fine.