Business Intelligence Developer (MCSE)

Partitioned for Performance

If you’ve done any of the Microsoft training or certifications in the last few years, you’ll probably noticed they’re really pushing for the use of partition tables-it’s appeared in at least 3 of my exams for the MSCA/MSCE. In my opinion, that’s a probably a good move as I believe partitioned tables are another one of those great features on SQL that’s not used as often as it should despite the many benefits it brings.

In my last blog on partition tables, I talked about the benefits from an archiving and management perspective. In this one, I’ll go over the query performance that can be achieved. If partitioning is something new to you, it’s worth thinking it as dividing large tables into smaller ones-the smaller the table, the fewer the number of records and reads and the faster the performance. The screen shot shows how this works and basically what we’ll be doing.

Partitioned vs Non Partitioned Table

We’ll be using the sample WideWorldImportedDW database from Microsoft for our demo, a copy of this new database can be found here. I’ll assume you know how to restore or attach a database and focus on the codes and what is happening as it is run. For those that want to follow along, the codes are available here.

To get things started, let’s examine the initial state of the DB after we’ve just restored it. For this table (Sales), note how it is already partitioned (7 partitions) and the database itself comprises of several file groups already. In comparison, database by default on creation have just one.

The fastest way to see if a table is partitioned-right click and select properties and Storage. As seen here, this table is already partitioned

Below are the properties for the database. See that it already has four file groups (including the logs). Also note the number of file groups need not correspond to the number of partitions.

The Files page of the table properties shows details of the files

For the demo, we’ll create a table based on the Fact.Sales fact table as this table is already partitioned and I want to show this is done, but also as it has just enough data for our analysis and also recreating a copy means we can strip out the noise from the column-store index on this table and focus on the execution plan as it relates to our newly created partitions.

First off, create some new Filegroups-five to be exact. IF you go to the folder of the C-drive below after the code is run, you’ll see our new files -this is our database (files). Note however you do not need to use multi Filegroups to use partitioning-the extra Filegroups is to further optimize performance. We could well partition the table using the same file/filegroup had we wanted to.

Now go back to the database properties and note the new Filegroups we just created, in contrast to what we saw earlier.

What we’ve done is essentially split our database into smaller chunks (or rather, add extra chunks). Next, create the three things we need-partition function, partition scheme and partitioned table. The partition function essentially sets the boundaries of where our data will reside, here, we’re using dates (years) but this can be any number of data types. The partition scheme basically binds the file groups for our partitions. For the partitioned table, the only requirement is the partitioning column (Invoice Date Key) must form part of the primary key.

Next, populate the table from the base table so we have data to play with. We’ll also create a secondary table by using a select into statement for comparison. This secondary table will have the same structure but won’t be partitioned and used for comparison.

As the focus is on partitions (and not index), we won’t worry about recreating the clustered index on this table-having one will make no difference to the partitions read which is our area of focus.

With our table populated, we can now check where our data now sits. The first query shows the count of records by year and the second query confirms which partition they are located. Noticed how the records are split and stored in different partitions.

Now run a simple select on the non-partitioned table. If we look at the query plan of this query, we can see the query returned 29,458 records but it had to read (scan) the entire table with 228,265 records.

Now run the same query but on our partitioned table and we can see/confirm:

The table is partitioned and we only had to read from one partition

We got 29,458 records as above but only read 29,458 records as all the records were contained in this one partition (partition 5 for 2016 year) in contrast to the 228,265 records on the non partitioned table above.

Now lets expand our date parameter for our second query on the partitioned table and see what we get. This time:

The table is partitioned as before but this time, we had to read from two partitions

We had to read more rows this time for the two partitions (101,356) as our query parameter spanned two years.

What about queries with joins? Same logic applies and so you get the same outcome-the date parameter meant we only had to read from the one partition and not the entire table this time.

Can we improve the performance further? Absolutely, just create a simple non-clustered covering index. Once created, the optimiser now can perform a much more efficient Index Seek instead of doing a table scan and we now see the number of reads further reduced from 71,898 to 61.

And finally, the most important thing to remember- while partitioning can be great for many reasons, it is not the panacea to all our problems. Primarily, if your query parameter(s) don’t align with your partition function, it could still read from one to all your file groups as the query below shows. Depending on your how your data is distributed and partitioned, the extra overhead could actually make your query slower than a non partitioned table.

SQL Server 2017 now supports up to 15,000 partitions, up from 1,000 for the SQL 2012 version. Clearly, Microsoft sees this as the way to go on handling high volumes and the proliferation of the ‘Big Tables’ and while you may not experience much performance gains on small tables like the one we just did for this demo (hence why we didn’t review the statistics) with just over two hundred thousand plus rows of data, you’ll certainly feel it when the volumes increases and runs into the billions as it’s becoming ever increasingly common in many workplaces. Hopefully, this blog has shown you how this is achieved.

For further reading, I strongly encourage you to read this article from Microsoft which go much more into detail.

Like most modern servers nowadays, Power BI has a fairly good set of web APIs that developers can leverage to extend functionalities of the product and further automate many processes. The developer’s section of the documentation provides an easy to follow set of instructions on how to get started and showcases some very useful feature … More Power BI REST API

Came across a few little extensions in Visual Studio Code marketplace recently that has and will further save me considerable time when I need to work with JSON. … More Serializing and Deserializing JSON

Whether you're a manager, developer or DBA, SQL Auditing is a powerful tool that can help you understand what is happening within your platform and its underlying data. Auditing on SQL Server allows you to monitor your server from two different aspects: at the server level and the database level. … More Auditing Database Servers with SQL Server-A Primer

In my last blog on partition tables, I talked about the benefits from an archiving and management perspective. In this one, I'll go over the query performance that can be gained. If partitioning is something new to you, it's worth thinking it as dividing large tables into smaller ones … More Partitioned for Performance

If you do any development beyond the most basic on the .Net Framework, understanding and resolving DLL dependencies is a necessity and a fact of life and without the right tools to guide you, this can be a very time consuming and frustrating exercise. … More Untangle and Understand DLL Dependencies with NDepend

It’s fairly common knowledge amongst database professionals partitioned tables can help considerably with performance in various situations but my guess is not many would have considered the enormous benefits when it comes to archiving, a necessity when it comes to massive tables. … More Horizontal Partitioning for Big Tables

Despite its reputation, I'm still a massive fan of linked servers. Microsoft developed it for a reason and it's ongoing presence on the platform since day one says it all. No one told you to pull back a gazillion rows on an inner join with a local table. So, when I saw it's now possible ... … More Linked Servers for Azure’s NoSQL Cosmos DB