SQL Server 2005 BCP Partitioning

Introduction

In SQL Server 2000 when an article is
BCPd to the file system (distribution working folder) during the snapshot
generation, there is always just one file used for the data. Conversely, in
SQL Server 2005 when you look in the distribution working folder after
creating a snapshot you might be surprised to find many such files for each
article, each containing a part of the table data.

Clearly there has been a big change in
the processing rules, which is not documented in BOL in any detail. I'll
refer to this overall process as "BCP Partitioning" - getting the term from
developers posting in the Microsoft Replication Newsgroup. This article
explains why BCP Partitioning exists, what to expect to occur and how to
troubleshoot if it all goes wrong.

Why was it created?

There are several benefits of BCP
Partitioning. Firstly, when the snapshot is being applied to the subscriber,
there might be a network outage. In SQL Server 2000 this would mean that the
complete snapshot would need to be reapplied, and in the case of "concurrent
snapshot" this will be all in one transaction. However, if you have a SQL
Server 2005 distributor and SQL Server 2005 subscribers there is now a much
greater granularity in the process. Each partition is applied in a separate
transaction, meaning that after an outage the snapshot distribution is able
continue at the partition level where it left off and complete just the
remaining partitions. For a table containing a lot of rows this could lead
to a huge saving in time. Other useful side-effects are that this can cause
less expansion of the transaction log (assuming the migration crosses a
backup schedule or we're using the simple recovery model) and it can lead to
paths of parallel execution of the BCP process for those machines having > 1
processor (it's true that parallel execution existed in SQL Server 2000, but
this was only for several articles and not for a single table). Similarly,
the same benefits apply when creating the initial snapshot using the
snapshot agent.

Testing It

To test the number of files produced and
to try to investigate the algorithm, I created a simple table (below) and
populated it.

On my machine I initially found a simple
dependency on the number of rows which on further investigation was
completely unrepeatable! ie the number of batch files seemed to vary with
the size of the table in unpredictable ways. Quite why this occurred is
fully explained in the troubleshooting section.

Results

(a) For 8 processors or less, the formula
used to calculate the number of partitions is:

default # partitions =
NUMBER_OF_PROCESSORS x 4

So, in my case as I have 4 processors, I
would expect there to be 16 partitions which is exactly what I saw. Even for
1 processor there could be 4 partitions, which reinforces the fact that
multithreading is only part of the reason behind the algorithm. Note that
the –BcpBatchSize parameter of the snapshot and distribution agents simply
governs how often progress messages are logged and has no bearing at all on
the number of partitions.

(b) there is a threshold of a minimum of
1000 rows after which partitioning occurs.

(c) the data was found to be evenly
distributed amongst the partitions (data files).

Troubleshooting

(a) "why is the number of partitions
incorrect"?

In many cases, as mentioned above, what I
found wasn't in accordance with the 3 rules. In some cases I had expected to
see many partitions and all the data was BCPd to just one file and in others
I only had <1000 rows although many partitions were created, most of which
were empty.
Runing DBCC SHOWSTATS - eg DBCC SHOW_STATISTICS ('testbcp','PK_TestBCP')
below - revealed the problem. It turned out that the statistics were out of
date and the replication processes doesn't update the stats themselves.

In reading the stats output above the
"Rows" total should be identical to that returned from "select count(*) from
testbcp". Running UPDATE STATISTICS testbcp in all cases resulted in these
totals being the consistent and subsequently the number of partitions
created was in agreement with the 3 rules.

(b) Disabling BCP Partitioning

To disable BCP Partitioning, you can add
the unofficial "-EnableArticleBcpPartitioning 0" switch to the snapshot
agent as shown below and a single datafile will be produced, just like in
SQL Server 2000:

Why would you want to turn off such a
useful feature? Well, anecdotally, things may get worse for folks don’t
start off with empty tables (archiving or roll-up scenarios) or if they use
concurrent snapshot (default for SQL2005) and any or all of CPU, disk I/O,
and network bandwidth can be the bottleneck in the attempt to extract more
snapshot processing throughput using BCP partitioning.

(c) Ensuring Bulk-Logging

For those tables which really expand the
transaction log, some DBAs like to enable the bulk-logged recovery mode to
minimise logging, but this won't always work when we are dealing with
multiple partitions. To ensure that there is a maximum chance of going down
the bulk-logged path you should use -MaxBcpThreads > 1 for the distribution
agent and ensure that the target table doesn't have any indexes on it before
the distribution agent delivers the snapshot or just use use -MaxBcpThreads
= 1 to turn off parallelism although the latter option obviously might
reduce performance.

Conclusions

The new BCP Partitioning functionality
offers many new improvements - initialization is generally faster and
certainly more resilient. Mostly we'd not even need to know that this
mechanism had changed in SQL Server 2005 and simply accept that somehow the
system seems improved and faster, but hopefully this article shed a little
more light on the issues going on behind the scenes and will help if
troubleshooting is ever called for.