When does SQL Server partition split move data?

One of the many benefits when using table partitioning in SQL Server is the metadata only operations, such as switch, split, and merge. However, if we are not careful, splitting partition can potential take a long time if data end up being moved between two partitions.

In this blog post we will take a look at when SQL Server partition split will be a metadata only operation, and when data will be moved between partitions.

According to Books Online, we should avoid splitting or merging populated partitions:

Always keep empty partitions at both ends of the partition range to guarantee that the partition split (before loading new data) and partition merge (after unloading old data) do not incur any data movement. Avoid splitting or merging populated partitions. This can be extremely inefficient, as this may cause as much as four times more log generation, and may also cause severe locking.

But let’s take a look at what is actually happening when we split a partition. By using fn_dblog to read from the transaction log, we can see whether we have data movement or not when we split a partition.

To demonstrate this, I have created a partitioned table with two partitions; one empty and one with 20000 rows. The 20000 rows are split between 10000 rows with the date 20120301 and 10000 rows with the date 20120501. I am using right range partitioning. Using the DMVs (see my previous blog post for a query that gives you information about the partitions), let’s take a look at the two partitions:

Test #1: Split a partition with data on both sides of the new boundary point

In this test we split with a new boundary point 20120401, which means that we will have data on both sides of the new boundary point.

This results in the following output, where we can see that there are 20258 entries in the transaction log.

If we take a further look, we can see that most of the entries are LOB_INSERT_ROWS and LOB_DELETE_ROWS, which tells us that 10000 rows are inserted into the new partition and deleted the old partition.

After this split, the partitions now look like this:

Test #2: Split a partition with data on the left side of the new boundary point

In this test we split with a new boundary point 20120601, which means that we will have data on the left side of the new boundary point, and the right side will be empty.

This results in the following output, where we can see that there are only 19 entries in the transaction log.

Despite we have data in the partition that we split, it is a metadata only operation. Quite a different case than our previous test. In this case, we are using right range partitioning, and if the data is only on the left side of the new boundary point there is no data movement. This makes sense, as the new partition being created is empty.

Test #3: Split an empty partition

In this test we split with a new boundary point 20120801, which means that we split the last partition which is empty.

This results in the following output, where we can see that there are 20307 entries in the transaction log.

If we again take a further look, we can see that most of the entries are LOB_INSERT_ROWS and LOB_DELETE_ROWS, which suggests that all rows are inserted into the new partition and deleted the old partition.

If we take a look at our partitions, we can see that the old partition 2 is now empty (although 1 page is still allocated for it) and the new partition 3 contains all the rows. So in the case where the data is on the right side of the new boundary point, and the left side is empty, all rows are moved.

Left range partitioning

If we had used left range partitioning instead of right range, the results would have been opposite the ones shown above. In this case, there would be data movement if there would be data only on the left side of the new boundary point, and it would be a metadata operation only if there would be data only on the right side of the new boundary point.

Conclusion

While the recommendation in books online is to only split empty partitions, it is possible to split a partition with data and still get a metadata operation only. However, this is undocumented, and could possible change. I have seen designs where the last (or first) partition was not kept empty, and as of SQL Server 2012 that seems safe. As long as we know that there only will be data on the left side (of the last partition, in right range partitioning) or the right side (of the first partition, in left range partitioning) of the new boundary point for the split.

Share this:

Written by David Peter Hansen

David Peter Hansen is a Microsoft Certified Master of SQL Server and works as Consulting Manager for Advectas Denmark. He has worked with database development and administration for 12 years, and has worked with the Microsoft Business Intelligence platform since SQL Server 2000. He specializes in developer coaching as well as scalable architecture and performance tuning on large-scale data warehouses and BI solutions. He is a frequent speaker at SQL Server and BI conferences around the world.