Index Tuning Week: Fixing Nonaligned Indexes On Partitioned Tables

Unquam Oblite

This post will not change your life, but it will help me remember something.

When you decide to partition a table to take advantage of data management features, because IT IS NOT A PERFORMANCE FEATURE, or you have an existing partitioned table that takes advantage of data management features, because IT IS NOT A PERFORMANCE FEATURE, you may have or end up with nonclustered indexes that aren’t aligned with the partitioning scheme.

But if you’re a silly goose — you know the type — the same kind of person who runs around creating nonclustered primary keys and then wondering why we call their tables heaps? You might create an index like this, or an index like this may be hanging around from before the table was partitioned.

Transact-SQL

1

2

CREATEINDEXix_fun_boy_twoONdbo.ptable(id)ON[PRIMARY];

CREATEINDEXix_fun_bunny_twoONdbo.ptable(filler_bunny)ON[PRIMARY];

Unhappy

Our new indexes are not partitioned! No good can come of this.

So Here’s The Part I Always Bungle

Index rebuilds are great for changing things about an index other than fragmentation. There are all sorts of lovely options.

Your other option would be to create indexes already aligned with the full syntax.

Transact-SQL

1

2

CREATEINDEXix_fun_boy_threeONdbo.ptable(id)ONpscheme(id);

CREATEINDEXix_fun_bunny_threeONdbo.ptable(filler_bunny)ONpscheme(id);

All Lined Up

I forget this a lot because I so rarely see clients using partitioning, at least in a meaningful way.

The conversation always goes something like:

“Did you know this table is partitioned?”

“No, wait, yeah, someone set that up years ago to fix [some problem].”

“So you’re not swapping partitions in and out?”

“Nope.”

“Do you want to?”

“Nope.”

“Cool, moving along!”

In those cases, it’s not worth going through the trouble to align those indexes, especially is one of them is the clustered index. You may be looking at a maintenance window. If your database is part of log shipping, mirroring, or an availability group, processing large index changes can really raise hell.

Yes, though I admit that the partitioning wasn’t actually necessary for performance; the data naturally arrives and is processed in an order conducive to segment elimination. The MSDN explicitly documents that table partitioning can be used to improve performance by ensuring segment elimination. I prefer documented, solutions to hacks relying on implementation details, even if those implementation details are semi-documented.

I’ll note that adding Columnstore databases was motivated by severe performance concerns (primarily with Disk usage). We spent a month or so of developer time investigating non-Microsoft columnstore databases. Then Microsoft started offering columnstore databases in SQL Standard. Shifting to columnstore databases ended up being a magic bullet. While there was probably a week or so of testing/experimentation, the actual solution itself was largely turnkey. Among other benefits, it dropped disk consumption by over 90%.

I’ll note that there are data management headaches 10+ years in my database’s future. Adding partitioning now will allow me (or my successor) to mitigate those headaches.

Very cool. You should totally blog about your experiences with it, when you need it, and when you don’t, when it hurts, etc.

I like reading MSDN blogs, but quite often they’re based on very small data sets (like Adventure Works, or Wide World Importers), and they don’t really show how people out in the real world use a feature.

Right now, there’s so few people blogging about this stuff it would be nice to have another voice out there (especially on Standard Edition, with its CS limitations on memory).

Microsoft has called partitioning for row store a performance feature in various places, so I’m always a bit cautious…

Any time someone suggests partitioning a row store as a performance feature, my default response is to apply the Socratic method:
1. Why do you think partitioning will improve performance?
1.No: That won’t improve performance at all because of X,Y,Z.
1.No2: Let’s test it with a toy example (performance is not improved)
2. Wow, that will improve performance a lot. It will also have [list of costs]. However, it’s still worth it. That said, let’s see if we can divide the data by [column used in partition function] without partitioning the table. Let’s just add:
2PCC. A persisted computed column instead. Here, I’ll convert your partition function into a giant case statement(*). Now, let’s add that column to the left of every index.
2RC. [whatever column the partition function is using] to the left of every index instead. We’re already using this column in the database, so adding it to our indexes won’t be hugely expensive.
3. I realize that step 2 will slow down anything which touches the database…but so would selecting the correct partition, so hopefully we’ll be fine. Now, we’ll just stick that column to the left of every index and we’re done.

Usually some variation on this series of questions either results in a viable alternative to partitioning or helps explain why partitioning is a bad idea. I’ll note that step 2 doesn’t work when discussing CCI, since CCIs don’t have keys.

(*)In practice, a case statement is overly general. More likely the computed column would be something simple like stripping the day from a date or rounding to the nearest 100.

Warning: Both proof by contradiction and the Socratic method come across as very condescending. If working with someone one-on-one, don’t use these methods as-is.

I’m with Brian. Use table partitioning because I receive data at fixed intervals so I made this the partitioning key, which means I can load a staging table and switch data into the final table very quickly. We use clustered columnstore and query individual intervals so we get a big benefit on partition elimination. The final table itself is also very deep (not so wide) so being able to incrementally update stats on individual partitions is also helpful.

TLDR; it’s a performance feature when you have a lot of data and are combining it with clustered columnstore.

“I’m not even asking you guys to back up anything you’ve said about it improving performance”

The reason we’re not backing up our claimed performance improvements is because there aren’t any performance improvements; it’s a loss. If done right (i.e., ensuring that each partition has at least 1 million rows and that each partition lines up with the types of queries you’re running), it’s a very small loss.

Really, the only case where partitioning will improve performance of column-store is on a pathalogical case: The data is being loaded in an order that is completely independent of your partition function. This is pretty rare; the kinds of tables that need partitioning tend to partition on things like dates and identity columns.

On the bright side, a well-chosen partition function will only make performance slightly worse, and makes query performance a bit less dependence on data-load order. I don’t like relying on implementation details, even when the risk of being wrong is low and the cost of being wrong is “only” performance.