March 10, 2014

I’m participating in this month’s T-SQL Tuesday. The host – handsome genius – has asked us to argue against a popular opinion. So the opinion I’m tackling today is that SELECT * should always be avoided.

I’ve heard that a lot. For example, it seems to be a popular question on Stack Overflow. The consensus is that SELECT * is discouraged in favor of SELECT col1, col2, col3 where you select the columns you need and no others.

I’m going to look at some of the reasons that are given to support this advice. I’m also going to qualify when it might not matter so much.

Unused columns cause excessive load on the database.
Touché. This is true, SELECT * often causes SQL Server to use the clustered index because it’s usually the only covering one. This can waste an opportunity to use a more efficient index.

However, even when us developers follow the “Query only what you need” advice, our SELECT queries often do in fact need every single column. In this case, SELECT * is fine.

Columns are returned in an unknown order
That’s true. The fear is that an application will contain a bug when it depends on a column order that is incorrect. Well simply don’t do that. Most applications are able to retrieve row values in the record set by column name rather than column position. Identify columns by name. Then the column order doesn’t matter.

You can’t search code for the use of a column
That’s an interesting one. Say I’ve got a USERS table with a Title column. This column is getting dusty and so we plan to deprecate it and ultimately remove it.

It would be great to search our code base and find where this column gets used. Presumably the code that contains SELECT * would be missed in the search but any code that contains SELECT ... Title would be found.

But how big a problem is this? The only trouble is when an application uses SELECT * and then after it gets the results, it takes that data set and looks for the column Title by name. If that’s the case, then you will probably find that instance of the Title string in your code search.

The development effort is not really impacted (because SELECT * doesn’t need to change). The testing effort is not impacted terribly either. I can’t think of a scenario where Ctrl+F column name is the only thing standing in the way of a bug getting into production. If you can think of one, then it’s probably not the only quality issue that needs attention.

TL;DR

I don’t mind SELECT * when I’m using every column and when I don’t depend on the order of the columns.

I sometimes think the standards people who decide such things keep the SELECT clause around for these EXISTS clauses because it freaks people out when it’s not there (kind of like the headless horseman).

@Andy, You’re right, it would be a problem. So don’t use SELECT * in that case.

I think my argument against “SELECT * should always be avoided” does not amount to “SELECT * should never be avoided.” Your example is certainly one case where SELECT * simply does not work.

I still think there are still plenty of one table queries that are simple enough to safely make use of SELECT *. These queries are so common. Especially in OLTP applications that are object oriented, or close to something like that.

You’re also presuming you know the future.
You need to change your statement to “I don’t mind SELECT * when I’m using every column and when I don’t depend on the order of the columns, and no-one will ever add another column to the table”

So you’ve got a table with two integer fields and use both of them.
And you performance test your application’s network traffic based on this.
Two years in the future someone adds a description column, with average 100 characters in it.
And your network traffic goes through the roof, because your calls are bringing back that column every time, as well as the two integer columns you are actually using.

Because you can’t predict the future of the database table(s), you shouldn’t use SELECT *

Sorry Steve, I don’t buy it. I still think there are cases where it’s safe to use SELECT *.
Three things about your specific scenario.

(1) Future developers will expect that any changes made to the system may result in changes to performance. Your scenario is no exception. Hopefully there are QA practices in place to catch scenarios. But perhaps we’re unnecessarily exposing ourselves to risk. Take it on a case by case basis.

(2) Network traffic? I have never in my career faced a network traffic bottleneck. I’m not saying that a table change couldn’t cause a performance issue, but I’d be curious to learn that the bottleneck turns out to be network traffic.

(3) The (int,int) -> (int,int,string) is more vulnerable to the issues you mention than say Adventurework’s Production.Product. This table has 25 columns. If I have a method which queries all 25 columns, then it’s likely that it will need 26 columns after the addition of a new column.

The problem with point 1 is not about the QA practices in place to catch it.
It’s the amount of work required to cope with it once you have caught it.

So if you have QA practices in place not to use SELECT * when the applications are being written, this is very little extra work.
And it is far less likely that when you add an extra column you will need to change an application.

On the other hand, if you don’t follow this rule, then it is far more likely you have to change an application.
And changing an application that was written more than a few months ago, and is in production, is almost always a LOT of work, and costly in time and money. (re-compilation, testing, upgrading etc.)

What you’re doing by never using SELECT * (and enforcing this by QA practices) is saving yourself a lot of potential work when the database changes, and adding a (very) little more work in coding.

I always avoid SELECT * is in views. If the table changes and you forget to recompile the view then at best you’ll get an error. At worst all your columns will be shifted and returning the wrong column names.

There are moments where select * doesn’t worry me. For example, if I have full control of my DB Schema & DDL, I know the column numbers and I know they would not change. In this case, I try to create GUI that would be dynamically created and the forms will be created based on the data and the field positions. So, doesn’t matter to me. This way, I save time by not needing to change code and involved developers.

Now, in the case that I have to use an specific number of fields, instead of all the fields, then, select * does not makes sense.

A good read. At my workplace we have no policy in place for “SELECT *”. I do use it in run-once queries, but I try to avoid it in stored procedures if possible.

One case where it is a huge pain in the behind that we have run into is in SSIS packages. You make the SSIS package with a “SELECT *” in it and add a column to a table and everything breaks. Mind you, SSIS gives you warnings in your SSIS package if table structure changes, but we had a few fail due to SELECT *’s.
I am the primary DBA at my work and had released the change after I had been told the table structure changes worked on both dev and test environments, and I saw the tests and it was returning the expected data, so I pushed it live. The SSIS package wasn’t updated as the developer thought it was outside of the scope of the changes. I didn’t check it as we have a lot of SSIS packages and checking them all would have been a huge pain (I have a method now for doing this that is semi-automated, but still a slow and painful process).

TL;DR – SELECT * has its place. It isn’t always good and isn’t always bad. The same can be said about (almost) any SQL commands… Even database shrinking.

You’re right Brian, SSIS is very fussy about the shape of a resultset. If a SELECT * gave five columns during design-time, then SSIS will freak out if it ever sees six.
I would support any best practice or policy that says “Never use SELECT * with SSIS”.

I like your TL;DR, it’s exactly the point I wanted to make when I picked the topic for the T-SQL Tuesday.