SQL SERVER – Simple Puzzle with UNION – Part 2

Yesterday we had very easy kind of Back to Basics Puzzle with UNION and I have received tremendous response to the simple puzzle. Even though there is no giveaway due to sheer interest in the subject, I have received many replies. Due to all the request, here is another back to the basic question with UNION again.

Let us execute following three query one by one. Please make sure to enable Execution Plan in SQL Server Management Studio (SSMS).

Query 1

SELECT 1 UNION ALL SELECT 2

The query above will return following result

The query above will return following execution plan

Query 2

SELECT 1 UNION ALL SELECT 2 ORDER BY 1

The query above will return following result

The query above will return following execution plan

Query 3

SELECT DISTINCT 1 UNION ALL SELECT DISTINCT 2 ORDER BY 1

The query above will return following result

The query above will return following execution plan

Now let us look at all the execution plans together.

When you look at closely at all the resultset – they all returns the same result. When we see their execution plan they are very different from each other. Now here is the question back to you.

Question: When we add DISTINCT in Query 3 it is technically more work for SQL Server to do than Query 2. However, the execution plan demonstrates that Query 3 is using much lesser resources than Query 2. WHY?

Please leave your answer in the comment section. I will publish all the valid answer in the blog next week with due credit.

Sanjay, When executing all three together, the breakdown is 0/67/33 — the numbers will always total 100%, regardless of the number of queries. The query cost percentage can only show the time in relationship to the other queries in the batch. Hence, when a query is run by itself, it’s query cost will always be 100%.

I am also getting 0/50/50. The displayed execution plan in the article doesn’t make sense. How can there not be a Sort for query 3? How would it know the order in which to display the results given the request of “ORDER BY 1”?

Query 2 lists all available records including duplicates which could increase the output list size and the sorting time while the presence of distinct in query 3 decreases the length of output list and helps to minimize sorting time. I think distinct is taking less time than sort because it has to work on less data while sort has to arrange the output of both part of the query.

The distinct causes the two sets to be sorted as a side-effect. Since the sets are one row each in this case, that sorting is really fast. Then taking two sorted lists (even sets with one than one row), and making one big sorted list out of them requires just merging them — going through them linearly, and taking whichever is the next lowest.

Hi, If you execute all the three queries together in a single session, then 100%,67%,33% cost comes. If you execute the 3rd query separately, 100% cost comes. That means when executing in the same session, the query uses the previously executed plan for the current execution. Correct if my answer is wrong.

No. When executing all three together, the breakdown is 0/67/33 — the numbers will always total 100%, regardless of the number of queries. The query cost percentage can only show the time in relationship to the other queries in the batch. Hence, when a query is run by itself, it’s query cost will always be 100%.

Query 3 uses the Merge Join operator while Query 2 uses the sort operator. A merge join operator is used on the two tables whose output columns have been presorted. The sort operator uses the concept of looping to get its output which requires an extra thread to perform what we call ‘Rebinding’. This increases the I/O Cost drastically. Hence the difference.

Here, query 3 taking lower cost than 2 as there is only 1 row for selection. Sorting individual rows in q3 is not taking any cost. Now, q3 uses merge and q2 uses sorting for ordering the result and merging is faster than sorting as merge is being applied on presorted lists.

If we select from a un-indexed table with 10 rows. then results are 13/30/57 (my sample), which are consistent. Since, q3 includes sorting individual results first so q3 cost is high.

I have to guess that there is a big difference between the Sort operator and the Merge Join operator. Looking at the properties of both I see an IO cost and a rebind associated with the sort operator that does not exist for the Merge Join. I have to guess that the Sort is internally doing some kind of loop that requires IO.

HI PINAL, IN the SECOND query,”SORT” operation takes place and we get estimated I/O cost is “0.0112613” IN the THIRD query,”MERGE JOIN” operation takes place i.e.., it takes two already sorted input tables exploitting the sort order.and we get estimated I/O cost is “0”

in other word ,in the second query both physical and logical operations are “SORTING” whereas,IN the third query physical operation=MERGE JOIN(takes already sorted input table) and logical operation=CONCATINATION

Hi Pinal, Because of explicit SORT operation on final result set causes second query more expensive … Whereas 3rd query doesn’t require any SORT operation externally… Note that “Merge join itself is very fast, but it can be an expensive choice if sort operations are required.” means if there is no sort operation required then MERGE JOIN is very fast… The above two are the reasons to have less cost for 2nd query

If I look at the plan carefully, the Query 2 uses the Sort operation and the Query 3 uses the Merge Join. There is an extra cost on performing Sort operation as it does sorting at both Physical and Logical operations. The Merger Join operator matches the rows from two sorted input tables thus at physical level its merging and logical level its concatenating the operation thus exploiting the sort order.

HI PINAL, the SECOND query,”SORT” operation takes place, hence the sorting took place in tempdb and we got the estimated I/O cost which is greater than 0 this make the query use little more resource than other query

In the THIRD query the select statement is already sorted with distinct operator and then the sorted result set is merged using merge JOIN concatenation, since there is no any writes or reads required here so this query take little amount of resource

Pinal Dave is a technology enthusiast and an independent consultant. He has authored 11 SQL Server database books, 21 Pluralsight courses and have written over 3800 articles on the database technology on his blog at a http://blog.sqlauthority.com. Along with 14+ years of hands on experience he holds a Masters of Science degree and a number of database certifications. For any SQL Server Performance Tuning Issue send email at pinal @ sqlauthority.com .

Nupur Dave is a social media enthusiast and and an independent consultant.