Jar-O-Skittles: Benefits of Table Partitioning

Quick note to the regular reader: don’t worry, still planning on finishing the series on Building a Scale-Out OLAP Farm in the Cloud with Azure VMs. But I’m afraid there’s not much time to work on that until we ship some important code (which I’ve been working around the clock on for 2 months). And so to buy some time and appease the masses demanding moar! from this blog, below is a post I’ve had in the archives for a while now.

Table partitioning adds development overhead (read: time/money) through additional complexity. It also requires an Enterprise license of SQL Server which – if you’re not already running enterprise edition – will add additional cost. However, it’s also an outstanding feature when designing a database with large tables and can easily pay for itself when used appropriately.

Most Developers and DBAs, when asked what’s so great about table partitioning, tend to respond with 1 or both of the following reasons:

Query Performance (typically the developer)

Data Maintenance (typically the DBA)

These are perfectly valid – but there’s a 3rd reason that is often overlooked and absolutely magnificent when it comes to ETL. But before I go into details on that 3rd use-case, let’s cover some basics while reviewing the first two…

Query Performance

To explain how partitioning can help query performance, let’s start with a basic example consisting of a large table containing sales transactions from the past 7 years. If a user wants to know the total sales amount by customers from transactions occurring in 2013 they might write a query like the one below:

SELECT CustomerKey, SUM(SalesAmount)
FROM dbo.Sales
WHERE OrderDate >= 20130101 AND OrderDate < 20140101
GROUP BY CustomerKey

Ignoring (for the sake of the argument I’m trying to make) any secondary indexes in place to help reduce the number reads needed to satisfy this query… it is likely that the optimizer will scan the entire table. If the table is large, this scan could be time and resource intensive. However, if we partition the table by OrderDate and create a single partition for each year:

The optimizer will recognize that it doesn’t need to look at the entire table. Instead, it can scan (or seek – depending on the indexing and filter predicate) a smaller subset of the table’s partitions…in this example it would only need to scan the 2013 partition.

To help understand what’s happening under the covers let’s simplify the initial example and use the good old “jar-of-skittles” example (I hate jelly beans). Instead of thinking about a table of records, let’s think of it as a jar of skittles…where the jar is a table, and each skittle is a record in the table.

Partitioning applied to this construct would look like the following:

Row – a skittle

Partition – a smaller jar used to hold skittles of a certain type

Partition Key – color of skittle (could be any number of attribute e.g. weight, circumference, etc)

Partition Scheme – rule assigning a buckets to a specific spot on the kitchen table

Instead of a single jar-of-skittles all mixed together, we have multiple smaller jars each with skittles of the same color.

Now let’s say you want to analyze the skittles – perhaps you want to count number of purple skittles…

SELECT COUNT(*)
FROM dbo.Skittles
WHERE Color = "Purple"

Before partitioning the skittles, we’d have to evaluate every single skittle – if it’s purple, add 1, if not, add 0. On the other hand, with the partitioned skittles, we can go directly to the bucket of purple skittles and count them quickly.

Alas, this example can also highlight how partitioning can sometimes hurt query performance. Imagine we have partitioned our skittles by color, but now we want to count the skittles with a circumference > the median (assuming a non-uniform distribution – again for the sake of argument). Now we’re back to having to evaluate every single skittle, 1 by 1, only this time, we have to go jar to jar. Assuming there is an overhead for going into a jar, it is actually more expensive to scan every skittle after they have been separated into separate jars.

Below are 2 other aspects of partitioning that relate to query performance and are worth a glance:

Partition Schemes and Disk I/O

Partitioning scheme is the object created that maps a specific partition to a specific filegroups (which map to data files; which map to disks). This adds quite a bit of flexibility (depending on your disk subsystem) to a partitioning strategy. For example, you might map partitions containing older/less queried data to filegroups comprised of data files on slower/cheaper storage while keeping more recent data/most queried data on your faster storage. There are also maintenance-related implications which will be described in the Data Maintenance section below.

Jar-of-Skittles analogy: you might consider placing the jar with your favorite color (i.e. purple) closer to you on the table for faster access.

Asymmetrical Partitioning

It’s not required to have symmetrical partitions – that is, where each partition is the same size in whatever units are being used. Instead of partitioning a table by year (i.e. symmetrical partitions) consider the following example below where the partitions are asymmetrical:

In this example, the oldest/least used/never changing data (e.g. 2012 and 2013) may be partitioned by year while and more recent data (e.g. 2014) may be partitioned by month. In fact, you might even go a step further and partition data for the current month by day.

Note: Using asymmetrical partitioning is more common with SQL Server 2005/2008 where the number of partitions for a single table was limited to 1000. That limits you to < 3 years of data if partitioning by day. Now, starting with SQL Server 2012, tables can have up to 15k partitions – which comes out to ~41 years of data if partitioned by day…so you’ve got a bit of flexibility.

By the way, there’s also a subset of asymetrical partitioning – referred to as “matrix-partitioning” – where a large table may be partitioned on a combination of customer/year. This is, however, an edge case that should be thoroughly thought through and tested before implemented.

Jar-of-Skittles analogy: imagine less jars where most contain only skittles of a single color (e.g. Jar1 = red, Jar2 = purple, Jar3 = green) and another jar containing the rest of the skittle colors (e.g. Jar4 = [yellow,orange]).

Data Maintenance

As data in a database grows old it becomes less useful. Business users, are typically only interested in the new “shiny” data. And so we start to notice that fewer and fewer queries are requesting this old data until eventually it’s been a few months, maybe even a year, since a single query has asked to see this once popular data.

When you have a ton of data that is rarely/never queried, taking up a lot of room on an expensive piece of hardware (materials + licensing) you have 3 options:

Do Nothing

Delete it

Archival

Option 1 sounds like a joke, but it is actually a legitimate option – sometimes you’re just not sure whether you’re going to need the data again and/or you don’t have the time/skills needed to archive it.

As for options 2 and 3 (i.e. Delete vs Archive) – this is really a matter of requirements. Perhaps industry regulation requires maintaining the data for a defined period of time (e.g. 10 years, 15 years, forever). In this case, businesses can save quite a bit of money by moving the data to older, slower, much less expensive disks in the data center. On the other hand, if there’s no retention policy, the business might just take a backup and delete the data.

Regardless of the decision, there’s a right way and a wrong way to move the data out of a large and heavily used table in a production environment.

One option is to copy it to a new table (via SELECT * INTO…) and delete it from the old table (via DELETE FROM…). Do this in a single transaction only of you want to be at the top of your operational dba’s to-kill list.

A better option is table partitioning which will allow you to create a target “archive” table with identical characteristics (i.e. columns, constraints, indexes, etc) and use the partition switch operation. This is a metadata operation, and the effect is that the old data in your large heavily used production table will now be in the target “archive” table and in its place will be an empty partition which can be removed via the partition merge function. Now you can move the target “archive” table to another filegroup on separate (cheaper) storage, mark it read-only, take a filegroup backup, and forget about it. Or you can drop the target “archive” table if you plan on deleting the old data.

Bottom line you have better options and you didn’t have to fill the transaction log and bring down a production system to archive old data that no longer gets queried. But – and this is a big **but** – it is very important to lead with a partitioning strategy. If your table gets large before you’ve implemented partitioning, it’s much harder to course-correct. To partition an existing table will require data movement which is what we’re trying to avoid.

Maintenance related to indexex and statistics are also worth a mention. With a partitioned table you can rebuild indexes for individual partitions instead of for the entire table. This is handy when you have several partitions where the data changes frequently (e.g. a partition containing data for the current day/week). Rebuilding an index for a single partition is much cheaper than doing if for the entire table. Same goes for statistics. Different partitions may be queried differently. Being able to create (and update) custom statistics on a small subset of a large table is much cheaper.

Jar-of-Skittles analogy: let’s say we want to get rid of all the green skittles (nobody likes green skittles and they’re simply taking up space on the table). Without partitioning our skittles, we’d have to comb through the big jar containing all skittles and separate the greens into another jar to stick in the pantry. However, if we’ve partitioned out skittles, we can simply take the jar of green skittles and empty it into another jar and stick it up in the pantry.

Just like the partition-elimination example, if the partitioning strategy (e.g. color) doesn’t line up with your archival/maintenance strategy (e.g. circumference), you’re not going to see much benefit.

Loading Data

Ok – so now that we’ve touched on the 2 most common reasons for using table partitioning…query performance and data maintenance…it’s time to discuss a 3rd, often forgot about, reason: data loading.

Inserting data into a table can be a slow and expensive process depending on a number of factors such as indexes and filegroup configuration. Another consideration is concurrency…the trade off between speed of inserting data and high concurrency can be an important one – especially if you’re trying to reload a data warehouse. For example, data can be loaded into a table very quickly by locking the entire table (e.g. Bulk Update Lock or TABLOCK) but the cost comes in the form of reduced concurrency. Once the TABLOCK is acquired, all other DML operations will be blocked. If you need to do both simultaneously, then you will need to drop the TABLOCK and your inserts might slow down a bit…and even then you need to consider the risk of lock escalation…if your making changes to enough data in the table, the X-lock could escalate to a table lock.

However, if the target table is partitioned, data can be loaded into a staging table and then a SWITCH command can be executed to swap the staging table into the target partitioned table. The stage table can be a HEAP and you can make use of the TABLOCK to get maximum insert speed without blocking operations against other partitions in the target table. But wait, there’s more — be sure to define the table with the LOCK_ESCALATION = AUTO option to limit lock escalation to the partition-level (preventing it from rising to the table level where it would then block simultaneous DML against different partitions).

Sounds awesome, cause it is!

True, you have to create matching indexes and constraints before swapping the stage table into the target partitioned table via the SWITCH command (which is “near-instantaneous”)…but that’s a cost you would have paid either way.

Note: the partition SWITCH operation requires a SCH-M lock on the target table so it will be blocked by existing locks…and it will block lock requests from sessions behind it. But once the SCH-M lock is acquired. the operation is instantaneous…which is why i say “near-instantaneous” operation.