Rob Farley - Owner/Principal with LobsterPot Solutions (a MS Gold Partner consulting firm), Microsoft Certified Master, Microsoft MVP (SQL Server), APS/PDW trainer and leader of the SQL User Group in Adelaide, Australia. Rob is a former director of PASS, and runs training courses around the world in SQL Server and BI topics.

T-SQL Tuesday - Query Cost

In SQL Server, the cost of a particular plans is based largely on I/O, which makes this post a good candidate for this month’s T-SQL Tuesday, hosted by Mike Walsh who, like me, walks a straight path.

In considering I/O – the movement of data generally In and Out of disk and memory – my thoughts come very much to query cost. If I run set statistics io on, I can see the reads and writes, both logical and physical, of the queries I run, and I can use this information to gauge the impact of my query on the system.

I can also see that the Estimated Subtree Cost as indicated in the Execution Plan seems to grow with the number of logical reads. This makes sense, as the number of logical reads required by a query can really make a query costly. A logical read implies that the read might only involve examining a page of data that is already in RAM, but it should always be remembered that if the required page of data is not in RAM already, it will need to be pulled in off disk first, being shown as a physical read.

The standard line with measuring cost in terms of I/O comes down to the impact of a physical read on the system. When a physical read occurs, this involves getting data off the actual disk(s), which I liken to the process of picking a track on an old vinyl record player (the records were vinyl, not the record players). Of course, the disk controller lines up the tracks far quicker than I ever could, but the principle still applies. Compared to shifting bits in RAM, it’s incredibly slow. Much less so with Solid State Disks, but that’s another matter entirely.

But the impact of I/O is certainly not the only factor on query cost. I remember some time ago having a discussion about string splitting with a friend of mine. I had seen some posts discussing the different ways of splitting strings, with the conclusion being that CLR functions are the best, as SQL Server generally isn’t very good at string manipulation, but it led to an interesting discussion.

Take one of the standard approaches to splitting strings, such as using a table of numbers. For the purposes of the experiment, I have a table called dbo.nums(num int primary key), populating to at least 50.

Here I’m constructing a query using dynamic SQL to split strings, making a query which performs a UNION ALL of all my queries. And when I look at the cost of a batch that runs both these methods (by asking Management Studio to show me the Plans), I find that the second one is incredibly cheap. In fact, if I turn on statistics io, I can see that there is no I/O at all for the second method, whereas the first requires 12 logical reads – presumably to get at my table of numbers.

Clearly my dynamic SQL method of splitting strings is fantastically fast, and this is completely true. It is fantastic – a thing of fantasy. It’s actually really awful.

You see, as I mentioned earlier, SQL Server isn’t particularly good at string manipulation. So much so that creating this piece of dynamic SQL is very costly, won’t scale, and should never be implemented in a real environment. You can have a look at it to verify what I’m saying, but if you try to do this on a large set of data, you will very quickly find out just how poor it is.

To test it, created a table called table_of_strings (string varchar(1000)) and populated it with those four strings. I then doubled its size ten times using:

insert dbo.table_of_strings select * from dbo.table_of_strings go 10

Using the table of numbers to split the strings involved over 12,000 logical reads, but the elapsed time showed 289 milliseconds to provide the 13312 rows of results.

The dynamic SQL method used a total of just 26 logical reads, but took 34 seconds to run on my (old) machine.

I’m not suggesting you ignore the I/O-based cost of execution plans, but you should most definitely consider more than just that – particularly as Solid State storage becomes more and more commonplace.