Wednesday, July 2, 2014

Looking at tracer tokens

People seem to be scared of replication in general. I can't say that I blame them; it's reputation for being fragile and non-intuitive is (in my opinion) well-deserved. However, there are some nice things. One of my favorite "costs almost nothing" features is tracer tokens.

The way that I think of a tracer token is a synthetic transaction that keeps track of when it makes it to each of the publisher, distributor, and subscriber. For instance, I can post a token today at 10:00:00 and it will tell me that it made it to the distributor at 10:00:03, and finally to the subscriber at 10:00:05. Moreover, the distribution database keeps track of all tokens posted historically (up to the history retention for the distributor).
How can we use this for monitoring? Well, the first step in doing this is to automate the posting of the tracer tokens. Most people use the "Post tracer token" button to introduce ad hoc tokens into the system. That isn't great for monitoring a system that's already behind as the actual problem may have started a while ago! Fortunately, the solution is simple: schedule a SQL Agent job to execute sp_posttracertoken. I leave it as an exercise to the reader to write a job that loops through all of the publications in the database and post a token for each one. Keep in mind that how often you schedule this to run is the granularity of your monitoring. So choose a frequency that's right for your environment.
We also need an easy way to read the data. For this, I wrote a couple of views. The first one just gets the raw data out of the tokens and presents it in an easy-to-read format.

The second uses the first to find (for each publication that this distributor services) the latest token that has made it all the way to the subscriber.

USE [distribution]
GO
IF EXISTS (SELECT * FROM sys.views WHERE object_id = OBJECT_ID(N'[dbo].[repllication_lag]'))
DROP VIEW [dbo].[replication_lag]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
create view [dbo].[replication_lag] as
with cte as (
select *, row_number() over (partition by [publisher], [publisher_db], [publication], [subscriber] order by [publisher_commit] desc) as [r]
from [dbo].[tokens]
where subscriber_commit is not null
)
select *
from cte
where [r] = 1

I like to set up alerting off of this second view based on the publisher_commit column (e.g. "select * from replication_lag where publisher_commit < dateadd(minute, -30, getdate())"). If the newest token to make it to the subscriber is "old" (for whatever definition of "old" you'd like to use), raise an alert.