The select list of a subquery
introduced by EXISTS almost always
consists of an asterisk (*). There is
no reason to list column names because
you are just testing whether rows that
meet the conditions specified in the
subquery exist.

Also, don't believe me? Try running the following:

SELECT whatever
FROM yourtable
WHERE EXISTS( SELECT 1/0
FROM someothertable
WHERE a_valid_clause )

If it was actually doing something with the SELECT list, it would throw a div by zero error. It doesn't.

3) Case:
a) If the <select list> "*" is simply contained in a <subquery> that
is immediately contained in an <exists predicate>, then the <select list> is
equivalent to a <value expression>
that is an arbitrary <literal>.

the EXISTS trick with 1/0 can be even extended to this SELECT 1 WHERE EXISTS(SELECT 1/0) ... seems a step more abstract then as the second SELECT has no FROM clause
–
whytheqSep 13 '12 at 10:50

1

@whytheq - Or SELECT COUNT(*) WHERE EXISTS(SELECT 1/0). A SELECT without a FROM in SQL Server is treated as though it was accessing a single row table (e.g. similar to selecting from the dual table in other RDBMSs)
–
Martin SmithNov 8 '12 at 21:22

@MartinSmith cheers - so the point is that SELECT creates a 1 row table before it does anything else so even though 1/0 is rubbish the 1 row table still EXISTS ?
–
whytheqNov 9 '12 at 9:14

Has this always been the case, or is it an optimization that was introduced in a particular version of SQL Server?
–
Martin BrownFeb 9 at 11:13

This shows that SQL Server was able to use the narrowest index available to check the result despite the fact that the index does not include all columns. The index access is under a semi join operator which means that it can stop scanning as soon as the first row is returned.

So it is clear the above belief is wrong.

However Conor Cunningham from the Query Optimiser team explains here that he typically uses SELECT 1 in this case as it can make a minor performance difference in the compilation of the query.

The QP will take and expand all *'s
early in the pipeline and bind them to
objects (in this case, the list of
columns). It will then remove
unneeded columns due to the nature of
the query.

So for a simple EXISTS subquery like
this:

SELECT col1 FROM MyTable WHERE EXISTS
(SELECT * FROM Table2 WHERE
MyTable.col1=Table2.col2) The * will be
expanded to some potentially big
column list and then it will be
determined that the semantics of the
EXISTS does not require any of those
columns, so basically all of them can
be removed.

"SELECT 1" will avoid having to
examine any unneeded metadata for that
table during query compilation.

However, at runtime the two forms of
the query will be identical and will
have identical runtimes.

I tested four possible ways of expressing this query on an empty table with various numbers of columns. SELECT 1 vs SELECT * vs SELECT Primary_Key vs SELECT Other_Not_Null_Column.

I ran the queries in a loop using OPTION (RECOMPILE) and measured the average number of executions per second. Results below

As can be seen there is no consistent winner between SELECT 1 and SELECT * and the difference between the two approaches is negligible. The SELECT Not Null col and SELECT PK do appear slightly faster though.

All four of the queries degrade in performance as the number of columns in the table increases.

As the table is empty this relationship does seem only explicable by the amount of column metadata. For COUNT(1) it is easy to see that this gets rewritten to COUNT(*) at some point in the process from the below.

I found that in the cases where the table has 1,024 columns most of the time the call stack looks like something like the below indicating that it is indeed spending a large proportion of the time loading column metadata even when SELECT 1 is used (For the case where the table has 1 column randomly breaking didn't hit this bit of the call stack in 10 attempts)

So one might speculate that the minor apparent difference when using SELECT some_not_null_col is that it only winds up checking permissions on that specific column (though still loads the metadata for all). However this doesn't seem to fit with the facts as the percentage difference between the two approaches if anything gets smaller as the number of columns in the underlying table increases.

In any event I won't be rushing out and changing all my queries to this form as the difference is very minor and only apparent during query compilation. Removing the OPTION (RECOMPILE) so that subsequent executions can use a cached plan gave the following.

There is no difference in SQL Server and it has never been a problem in SQL Server. The optimizer knows that they are the same. If you look at the execution plans, you will see that they are identical.