Database Administrators Stack Exchange is a question and answer site for database professionals who wish to improve their database skills and learn from others in the community. It's 100% free, no registration required.

We're busy load testing an OLTP system we've developed in .NET 4.0 and runs SQL Server 2008 R2 in the back. The system uses SQL Server Service Broker queues, which are very performant, but we are experiencing a peculiar trend whilst processing.

SQL Server process requests at a blistering rate for 1 minute, followed by ~20 seconds of increased disk write activity. The following graph illustrates the problem.

Moved away from a message queue driven approach to a simple/crude table monitoring design.

Used different loads from light to heavy.

Fixed all deadlocks.

It seems as if SQL Server might be building up its cache and writing it to disk at specific time-based intervals, but I can't find anything online to support this theory.

Next, I plan to move the solution over to our dedicated test environment to see if I can replicate the problem. Any help in the interim would be greatly appreciated.

Update 1
As requested, herewith a graph that includes the Checkpoint Pages/Sec, Page Life Expectancy, and some disk latency counters.

It appears as if the Checkpoint (light blue line) is the cause of the reduced performance (yellow line) we're observing.^

The disk latency remains relatively consistent during processing and the page life expectancy does not seem to have any noticeable effect. We also adjusted the amount of ram available for SQL Server, which also did not have a big effect. Changing the recovery model from SIMPLE to FULL also made little difference.

Update 2
By changing the "Recovery Interval" as follows, we've managed to reduce the interval at which checkpoints occur:

Add the checkpoint pages/sec counter. And test again and show the graph. And while your transactions go down and writes go up - are you seeing performance issues? I'd also add some disk latency counters - avg sec/read and avg sec/write
–
Mike WalshNov 30 '12 at 13:12

And when you post the next graphs can you include the numbers. That graph doesn't show any scale.
–
Mike WalshNov 30 '12 at 13:13

5

And one last thing (sorry!) - What is the memory on this server? Can you add the page life expectancy counter as well? Can you describe the physical setup (memory, IO setup, have you split your log and data files, etc)
–
Mike WalshNov 30 '12 at 13:18

2

Which recovery model is the database in? This looks like automatic checkpointing as the transaction log fills up. Note that even if the database is in FULL or BULK_LOGGED, it still behaves as if it's in SIMPLE until you take a full backup.
–
Jon SeigelNov 30 '12 at 17:43

2

Jon - Checkpointing will still happen regardless of the recovery model. Simplified: the only difference is what happens to the data in the log after a checkpoint in the recovery models.. In Full it stays in the log and needs to be backed up. In simple it can be truncated (or marked for truncation.. reuse) but checkpoint still has to happen.
–
Mike WalshNov 30 '12 at 18:03

1 Answer
1

Others have already pointed out the culprit: SQL Server accumulates updates in memory (in the buffer pool) and only flushes them out periodically (at checkpoints). The two options suggested (-k and checkpoint interval) are complementary:

lowering recovery interval will cause the checkpoint to start more often

But I did not respond only to regurgitate the fine comments you received do far :)

What you're seeing is, unfortunately, a very typical behavior of queued processing. Whether you use Service Broker queues or opt for using tables as queues approach, the system is very prone to this kind of behavior. This is because queuing based processing is write heavy, even more write heavy than OLTP processing. Both enqueue and dequeue primitives are write operations and there are almost no read operations. Simply put, queue processing will generated the most writes (= most dirty pages, and most log) compared to any other workload, even OLTP (ie. TPC-C like workload).

Very importantly, the writes of a queue workload follow an the insert/delete pattern: every row inserted is very quickly deleted. This is important to distinguish from an append-only pattern of a insert heavy (ETL) workload. You are basically feeding the ghost cleanup task a full meal, and you can easily outrun it. Think about what that means:

enqueue is an insert, it will create a dirty page

dequeue is a delete, it will dirty the same page again (it may be lucky and catch the page before checkpoint, so it will avoid double-flush, but only if is lucky)

ghost cleanup will cleanup the page, making it dirty again

Yes, it really means that you may end up writing a page three times to disk, in three different IO requests, for each message you process (worst case). And it also means that the random IO of checkpoints will be really random as the write point of the page will be visited by those moving heads again between two checkpoints (compare with many OLTP workloads tend to group the writes on some 'hot spots', not queues...).

So you have these three write points, racing to mark the same page dirty again and again. And that is before we consider any page splits, which queue processing may be prone too because of the insert key order. By comparison 'typical' OLTP workloads have a much more balanced read/write ratio and the OLTP writes distribute across inserts/updates/deletes, often with updates ('status' changes) and inserts taking the lion's share. Queue processing writes are exclusively insert/delete with, by definition, 50/50 split.

Some consequences follow:

Checkpoint becomes a very hot issue (no longer a surprise for you)

You'll see heavy fragmentation (the fragmentation per-se won't matter much as you are not going to do range scans, but your IO efficiency suffers and ghost cleanup has more to work, slowing it down even more)

Your MDF storage random IO throughput is going to be your bottleneck

My recommendation comes in 3 letters: S, S and D. Move your MDF to a storage that can handle fast random IO. SSD. Fusion-IO if you have the moneys. Unfortunately this is one of those symptoms that cannot be resolved with more cheap RAM...

Edit:

As Mark points out you have two logical disks backed by one physical disk. Perhaps you tried to follow best practices and split log on D: and data on C: but alas is to no avail, C and D are the same disk. Between checkpoints you achieve sequential throughput but as soon as checkpoint starts the disk heads start to move and your log throughput collapses, taking down the entire app throughput. Make sure you separate the DB log so that is not affected by data IO (separate disk).

btw it would be interesting to know why checkpoint driven IO causes such a dramatic impact on application counters. Ideally the application should plow ahead while the checkpoint does its work. Of course, I assume you do no share LDF and MDF storage access path (if you do, then you deserve it...). Perhaps you have some unnecessary contention points in the application.
–
Remus RusanuDec 3 '12 at 14:20

Looking at the perfmon counters listed, I suspect you might be right on the data and logs being on the same drive or array.
–
Mark Storey-SmithDec 4 '12 at 2:14

@MarkStorey-Smith: I think you're right, OP has C: and D: logical disks backed by the same physical disk. I doubt that physical disk is a battery of 100 short striped spindles, so this is probably the root cause.
–
Remus RusanuDec 4 '12 at 6:58

Yes, this test was done on my local dev machine, which only has a single drive. Thanks for the help all.
–
AndreDec 5 '12 at 10:16