Performance Impact: Finding the Most Optimal Batch Size

In my previous posts (1, 2, 3), I focused on the performance behavior of setting the checkpoints and transaction commit sizes to once every 16 inserts and once every 100,000 inserts. A question remains: what is the most optimal size?

In other words, in the following script, what value should we give to variable @batch_size so that the script will complete in the least amount of time?

As mentioned previously, the optimal value for @batch_size--the batch size of the transaction commits and checkpoints--is the one that can get SQL Server to take full advantage of the storage performance capacity. The performance of this test script is first and foremost a storage issue. You can try to calculate the size of each checkpoint and transaction for a given value of @batch_size, and therefore get some idea as to what kind of I/Os SQL Server may do. But that's not all that reliable. It's easier to just run a series of tests to see the performance impact of the @batch_size values.

For the tests, I ran the script with @batch_size set to the following, respectively:

3,000,000,

2,000,000,

1,000,000,

100,000,

10,000,

32,

16, and

8

The following two charts show the script run duration for each of the @batch_size values when (1) manual checkpoints were used, and (2) automatic checkpoints were used. For manual checkpoints, the 'recovery interval' sp_configure option was set to 32,767 effectively disabling automatic checkpoints, and for automatic checkpoints, the option was set to the default value of 0.

Fig. 2 is a subset of Fig. 1 with @batch_size only for the range between 3,000,000 and 10,000 since in Fig. 1 the duration difference between manual checkpoints and automatic checkpoints for this range is completely obscured by the huge values for the range between 10,000 and 8.

Three observations are immediately obvious from these two charts.

First of all, you definitely don't want to commit and checkpoint too often. The adverse impact is rather dramatic (e.g. the duration could increase from several minutes to several hundred minutes).

Secondly, for practical reasons, rather than going through all the trouble to find an optimal value for @batch_size, it's easier to just rely on automatic checkpoints. And in that case, it doesn't make much a difference what value you may set variable @batch_size to. In other words, the performance of the script is not sensitive to the size of the transaction commits. Well that's true at least in this particular test setup. And, no doubt one can find a test setup that the transaction commit size can make a huge difference, and should be carefully considered. So, don't over-generalize on this observation.

Finally, if you do decide to control the checkpoints yourself, issuing checkpoints too infrequently can't degrade performance, as shown in Fig. 2, though one can argue whether it really matters that much if the script completes in 7 minutes rather than 15 minutes. But the trend of the impact is unmistakable.

My take for the performance degradation when @batch_size was set to 2,000,000 or 3,000,000 is that dirty pages eventually filled up all the buffer pool and forced SQL Server to rely on lazy writes to flush pages to disk and make room in memory. Note that in this test 2,000,000 inserts translate into 2,000,000 pages which are 16GB in size and greater than 12GB allocated to the buffer pool of the SQL Server instance. If you think there is a better explanation, let me know.

Comments

Thanks for posting this. Regarding your last paragraph, the reason for the differences. it is in the logging system. Each time you COMMIT it collects all the log buffers that contain data for that Unit of Work (UOW) and synchronously writes them to disk. The dirty pages of data are written fully asynchronously so they don't have a material impact on elapsed time.

Note that there is only one logging system and when it is synchronously writing the slower the overall instance of the DB will run; most work in the instance stops while this is happening. So it is the impact on the logging system from frequent Commit processing that causes the slowdown.