Wednesday, July 8, 2009

This is the final part of a two part series, where I demonstrate the most commonly used methods to concatenate column values into a delimited string, http://jahaines.blogspot.com/2009/06/concatenating-column-values-part-1.html. This time around I will be focusing on the performance of each method described in part 1. The primary methods we are looking at are FOR XML PATH –SubQuery, FOR XML PATH-CrossApply, and a scalar UDF.

I will be using Profiler to capture performance metrics, as SQL Server IO stats are somewhat unreliable, in regards to UDFs. The IO statistics are unreliable for scalar UDFs because the IO does not account for the IO required to obtain the function result. It only returns the IO resulting from the main query. I will post an example of this in a future post. Let’s start by creating a new profiler trace. Open profiler and add the SQL:BatchCompleted counter. You will only need the columns textdata, Reads, and CPU. Add a database filter for ’%tempdb%’ and a filter on the textdata column. The textdata filter should be like ‘%—**%’. --** is a special string we put in our batch. This way we can ensure that we only get the statements we want. Now open management studio and make sure to discard the grid results. You can set this in query options. Query –> Query Options –> Grid –> Discard results after execution. Now let’s run our create/test script to generate our table and objects, as shown below. The main things to note in the below code is the @Batch variable. This variable dictates how many rows your table will contain. I ran the code below for 5 different table sizes (100,1000,10000, 100000, and 1000000) 10 times each. This gives us a pretty solid average, for our tests. Your results may differ from mine, as there are lots of factors that can influence the result of performance counters, but you should still be in the same ballpark. In the end I created 10 trace files, each having 10 executions of the code below. The break down was 5 trace file where a valid predicate was not used and 5 trace files where the predicate was used. Note: The predicate I am referring too is commented out in the below code. Just uncomment it to run the predicate version of the code. If you want all the test files I used, you can download them here: http://cid-6f041c9a994564d8.skydrive.live.com/self.aspx/.Public/Concatenating%20Columns%20Part%202/ConcatenateCols%7C_Pt2.zip

I am not going to show the methods used to load a trace file into a table because that is outside the scope of this post. You can download all the .sql files for this post using the link at the top or bottom of this page. Essentially, I used the system function to pull in the trace file and I inserted the data into a table in tempdb. Once the data is in tempdb, I used the pivot function to pivot the data into a format Excel could use to create a chart. From there I just had to copy the results and paste them in Excel (to generate the chart). Okay, now that I have laid all the ground work, let’s get to the numbers. I will start with the reads for the query that is using a predicate.

As you can see, the number of reads is very much alike for all three methods. The numbers of reads stays pretty constant for each query across subsequent runs, with differing distributions of data. This result makes sense because the number of reads does not need to change because the optimizer is able to seek the data, which excludes the need to look through additional pages. The best and worst performing methods in this scenario is quite obvious to see. The sub query method consistently had lesser reads, while the other methods require more reads. The cross apply method is by far the worst choice here. Next let’s look at the the reads for the query without a predicate.

These results are are closer to what one would expect. Essentially, as the number of rows increases, so does the number of reads. In this case both the reads and the table size increase by a factor of 10. The UDF and the sub query methods are neck-and-neck for this test, but if I had to choose I would say the sub query method wins. The cross apply method is by far the worst choice again. Now let’s talk about the CPU usage for each query.

No, real big surprises here. The CPU time increase exponentially based on the number of rows being processed. The results are pretty consistent with our previous results. The sub query is by far the best performing method and the cross apply is the worst.

The CPU times for the query with predicate deviates so little that it is pretty inconsequentially, but notable. I would say the UDF method is the best because it has a lesser cumulative CPU time.

So that’s it…. what have we learned? My take away is the same as it was coming into this test. You should always test each method before saying any method is better than another. Performance depends on many factors that may be different in your environment. Our ultimate goal is to create processes that works best for our environment, so we should never limit our options. If I had to choose a “winning” method for this test, I would choose the FOR XML PATH – sub query method because it consistently has lesser reads and CPU time.

4 comments:

I didn't read this blog entry until today, and it was timely because I was playing with FOR XML PATH in preparation for a blog post of my own.

Like you, I started to put together a query using CROSS APPLY because of readability, and my eyes popped out when I saw your charts on the reads and CPU time and how horribly the CROSS APPLY approach performed.

In looking at the query plan for the subquery method vs the CROSS APPLY method, it looks like the optimizer is "smarter" in processing the subquery method because it does the GROUP BY on SomeID first, and THEN it will execute the FOR XML subquery for each of those already-distinct-and-grouped SomeIDs. For your 1,000,000 row table, your random generation of SomeID created 250,000 distinct values, so it only executed the FOR XML subquery 250,000 times.

On the other hand, the CROSS APPLY method will execute the FOR XML for EVERY SINGLE ONE of those 1,000,000 rows.

You can see in your statistics that the subquery method took a quarter of the time (and reads) of the CROSS APPLY method, simply because it only did a quarter of the FOR XML executions that CROSS APPLY did.

Subscribe

Search This Blog

About Me

Adam Haines is an experienced IT professional with over 10 years of database experience in commercial and government sectors. I am a former Microsoft MVP and a Microsoft Certified Master (MCM) for SQL Server 2008. I have expertise in performance tuning, database development, disaster recovery, high availability, database architecture, data modeling, capacity planning, replication, centralized reporting, business intelligence, and ETL design and implementation.
When I am not working, I try to spend as much time as possible with my wife and our three young children.