How To Prevent SELECT * The Evil Way

SELECT * FROM…when administrators see that from developer code, we generally tend to cringe. Why? In a nutshell, it’s terrible on a few fronts. First, typically that SELECT * FROM that gets written (a lot of the times) lacks a WHERE clause. What’s the problem? Well, that pulls back every single row from the table.

Sure, that may not be too bad on a table with a few hundred rows but what about on a table with millions? That could cause a lot of performance problems since you’re trying to read all the data off disk (again, potentially). Secondly, do you really need all those rows and/or columns? It’s a waste of time and resources to pull back every column from a table if your application is only going to be using a few of them anyways.

So how do we prevent this? Well I recently learned an extremely evil way of preventing such a query. I’d like to start off with, this was NOT my idea. I learned this trick from Adam Jorgensen (Blog | Twitter). I’d also like to add this disclaimer:

DO NOT JUST GO DO THIS IN YOUR PRODUCTION ENVIRONMENT! I am not responsible for whatever evil you turn loose upon your environment. Always test things out in a development environment first and get proper approvals before making any changes.

Pure Evil Method

This method is actually evil in its simplicity. What we’ll be doing is adding a new column to the existing table. The “trick” is that this will be a computed column whose formula will cause an error, specifically a divide by zero error. As shown in screenshot below, create the new column on the table and call it something obvious like ‘DoNotSelectAll’. In the Column Properties window, under the Table Designer section, there is a property called Computed Colum Specification. In the formula section, enter (1/0). Save your table.

Now if I try to do my SELECT * on this table, I’ll get this lovely message:

Alright, we learned our lesson, now we’ll explicitly name our columns that we need:

Now this last query worked but notice how I didn’t put a WHERE clause so it pulls back all rows anyways? Yup, your users can still pull back everything, but at least they’re not doing a SELECT *. Also keep in mind, if you’re used to right-clicking that table in SSMS and selecting TOP 1000, with this column in place it will error for you as well. What are your alternative options?

LESS EVIL METHODS

Abstraction

Another way to control this kind of behavior is by not letting users hit base tables at all. You could create Views that have queries in them that limit rows returned. This way a user can do a SELECT * on a view, but the underlying code of the view itself is limiting row returns.

Depending on your situation, this could work and it could not. If the user needed very specific data returned that wasn’t in that limited pool of results could adversely affect whatever process they’re using the data for.

Another option is wrapping everything in stored procedures and granting users access to executing stored procedures rather than querying tables and views. On the one hand, could be good since you’re encapsulating the code. Users can pass parameters to stored procedures so you could make the queries somewhat dynamic.

Handbrake

In SQL Server 2008 they introduced a feature called the Resource Governor. This feature allows you to throttle resources on queries based on custom functions and groupings you specify. Yes, it’s an Enterprise Edition feature but it can be well worth it if you’re having resource-related issues due to runaway queries.

Now this feature will NOT prevent users from doing SELECT * –type queries, however you can throttle how much resource is allocated toward a query so you can at least control how badly it’ll affect you.

Security

My friend Brian Kelley (Blog | Twitter) will probably appreciate this one. Be stringent with the accesses you grant! Grant users only the accesses they need. Also, ff you don’t want users banging against your transactional systems directly, think about setting up a dedicated/isolated reporting environment and point the users there instead.

The reporting box you stand up doesn’t have to be (necessarily) as beefy as your transactional system and you can setup customized security on that database. This is especially helpful for when the transactional system is a vendor application which you can’t make any modifications to the code.

Do you have any other suggestions/tricks to help prevent crazy user queries? Let’s hear it in the comments!

Post navigation

13 thoughts on “How To Prevent SELECT * The Evil Way”

Code reviews. I started insisting that all SQL code that gets deployed goes through review by a DBA team; we highly discourage SELECT * and otherwise inefficient code, to protect the platform and to encourage good development practices. Very few people can access the live system from a terminal server, so security is a big piece. As in, hugemongous big (that’s a technical term). I have not yet implemented the RG, but that is a recommendation I’ve heard from several experts including yourself.

So many of my devs are using development tools that generate SQL statements that they’ve lost their SQL writing mojo. And too many of these tools default to SELECT *. Sadly, I’m not sure if I took those rights away if we’d be able to do any development.

This is so sexy I am very jealous that you thought of it. Damn you! And Bless You!

Per my tweet from last week, I have managed to set up a DDL trigger that looks for such a computed column on table create/alter and will roll back if it doesn’t exist. Still working on making it work for view definitions.

As a thank you, if you go the “select only from views” route, you can add this WHERE clause to your view definition:

True and I had started to write something along those lines in the article but forgot to actually WRITE that part about the WHERE clause. Great points! It’s all about education in the end but this is post is simply a guide to help prevent general SELECT * queries. Again, this isn’t an end-all, be-all solution, just a general evil idea.

So what happens when someone changes the table structure and your app doesn’t get updated to know about it? Using ordinals for retrieving columns is the best for performance in .NET so that SELECT * just broke your code because someone inserted a new column in the middle of the table. Even if you want to return all columns, they should be named to provide an exact ordinal position to prevent the .NET code from having to be refactored if someone does change the table.

“So what happens when someone changes the table structure and your app doesn’t get updated to know about it?”
– one of two things usually… If they have removed a column then the program wont run even WITH defined columns and if they have added a column (barring any performance issues) nothing breaks. And how often does ‘someone’ alter your table definitions without dev or dba knowledge? If you work in a smaller environment/company (as I do) you might find that the table definitions dont change unless the program is also changing, hence a re-write is needed anyway. Lastly, what about my super clever .NET routine that handles a dynamic table definition and in fact is built for that very reason?
TLDR; As Allan correctly pointed out, CONTEXT is everything. To say NEVER use Select * is just as wrong as saying ALWAYS use Select *.

Instead of limiting rows returned from a view, create a VIEW with computed column
CREATE VIEW my_view
AS
SELECT *,1/0 as DoNotSelectAll FROM source_table
Now allow users to access this view instead of source_table and SELECT * FROM my_view will throw a divide by zero error. This way you don’t need to limit number of rows returned from a view.