SQL Server: switch partition and metadata inconsistency issue

In this blog post, I would like to share with you a weird issue I faced when I implemented a sliding Windows partition scenario on SQL Server 2014 SP1 at one of my customer. The idea was to keep SQL Server audit records from several SQL Server instances into a global archiving table during two years in order to meet the customer audit specifications. Data older than year – 2 should be deleted. Based on a first implementation, we estimated the total data size to 800GB per year.

So we decided to implement the sliding Windows scenario as follows:

Right-based partition function is used in this context

Two partitions exist to store current and year – 1 audit data

Partition that contains data older than one year will be switch and then drop

One additional empty partition exists to avoid data movement that may lead to high resource consumption (see my previous blog post here)

In addition, we also implemented two stored procedures in order to automate sliding partition process based on the Microsoft article here.

And here came the (weird) issue. I always used the following (simplified) script so far to get table information regardless if a table is partitioned or not.

Let’s say that the partition 1 contains Year-1 data. Si I included this query to detect dynamically which partition to switch inside the stored procedure that deals with the left side of the partitioned table (that includes switch and merge operations) and this is exactly where the problem occurs.

The new scenario is as follows:

After switching the old partition and then merge the existing left boundary, I noticed metadata inconsistency regarding the following error message. In short, this message indicates my query doesn’t detect correctly which partition hosts data to delete because it seems to not be correctly aligned with archive table. They are not on the same filegroup.

You may notice that the output is not pretty the same than the previous query.

This time the partition number 1 seems to be more accurate with the FG_AUDIT_SQL_03 value. So why this difference between the two queries? If we take a further look at the both queries, we may see a noticeable difference with the use of sys.allocation_units DMV in the first query versus sys.destination_data_spaces DMV in the second one.

Let’s divide and simplify the above queries into small pieces and let’s have a look at the following result:

Red arrows concern the first query and the green arrows concern the second one. The arrows represent the value used for joining DMVs together. You may notice the filegroup name is different between the both queries which makes me think that the data_space_id value from the sys.allocation_units DMV is inconsistent in this case. In addition, we may notice two same data_space_id column values for each different value of container_id. The point here is that if I rebuild the corresponding index, this inconsistency disappears as shown below:

After further investigations, I found out the following Microsoft KB3095958 that talks about metadata inconsistency after a partition switch and SQL Server 2014 SP1. Applying this KB on my environment fixed this specific issue in my case.