Monthly Archives: July 2012

SELECT * FROM…when administrators see that from developer code, we generally tend to cringe. Why? In a nutshell, it’s terrible on a few fronts. First, typically that SELECT * FROM that gets written (a lot of the times) lacks a WHERE clause. What’s the problem? Well, that pulls back every single row from the table.

Sure, that may not be too bad on a table with a few hundred rows but what about on a table with millions? That could cause a lot of performance problems since you’re trying to read all the data off disk (again, potentially). Secondly, do you really need all those rows and/or columns? It’s a waste of time and resources to pull back every column from a table if your application is only going to be using a few of them anyways.

So how do we prevent this? Well I recently learned an extremely evil way of preventing such a query. I’d like to start off with, this was NOT my idea. I learned this trick from Adam Jorgensen (Blog | Twitter). I’d also like to add this disclaimer:

DO NOT JUST GO DO THIS IN YOUR PRODUCTION ENVIRONMENT! I am not responsible for whatever evil you turn loose upon your environment. Always test things out in a development environment first and get proper approvals before making any changes.

Pure Evil Method

This method is actually evil in its simplicity. What we’ll be doing is adding a new column to the existing table. The “trick” is that this will be a computed column whose formula will cause an error, specifically a divide by zero error. As shown in screenshot below, create the new column on the table and call it something obvious like ‘DoNotSelectAll’. In the Column Properties window, under the Table Designer section, there is a property called Computed Colum Specification. In the formula section, enter (1/0). Save your table.

Now if I try to do my SELECT * on this table, I’ll get this lovely message:

Alright, we learned our lesson, now we’ll explicitly name our columns that we need:

Now this last query worked but notice how I didn’t put a WHERE clause so it pulls back all rows anyways? Yup, your users can still pull back everything, but at least they’re not doing a SELECT *. Also keep in mind, if you’re used to right-clicking that table in SSMS and selecting TOP 1000, with this column in place it will error for you as well. What are your alternative options?

LESS EVIL METHODS

Abstraction

Another way to control this kind of behavior is by not letting users hit base tables at all. You could create Views that have queries in them that limit rows returned. This way a user can do a SELECT * on a view, but the underlying code of the view itself is limiting row returns.

Depending on your situation, this could work and it could not. If the user needed very specific data returned that wasn’t in that limited pool of results could adversely affect whatever process they’re using the data for.

Another option is wrapping everything in stored procedures and granting users access to executing stored procedures rather than querying tables and views. On the one hand, could be good since you’re encapsulating the code. Users can pass parameters to stored procedures so you could make the queries somewhat dynamic.

Handbrake

In SQL Server 2008 they introduced a feature called the Resource Governor. This feature allows you to throttle resources on queries based on custom functions and groupings you specify. Yes, it’s an Enterprise Edition feature but it can be well worth it if you’re having resource-related issues due to runaway queries.

Now this feature will NOT prevent users from doing SELECT * –type queries, however you can throttle how much resource is allocated toward a query so you can at least control how badly it’ll affect you.

Security

My friend Brian Kelley (Blog | Twitter) will probably appreciate this one. Be stringent with the accesses you grant! Grant users only the accesses they need. Also, ff you don’t want users banging against your transactional systems directly, think about setting up a dedicated/isolated reporting environment and point the users there instead.

The reporting box you stand up doesn’t have to be (necessarily) as beefy as your transactional system and you can setup customized security on that database. This is especially helpful for when the transactional system is a vendor application which you can’t make any modifications to the code.

Do you have any other suggestions/tricks to help prevent crazy user queries? Let’s hear it in the comments!

Being a DBA is like being a train conductor. One of the biggest responsibilities is making sure all jobs are running as expected, or making sure “all the trains are running on time” so to speak. As my partner-in-crime Devin Knight (Blog | Twitter) posted earlier, we have come up with a solution to identify and alert for when SQL Agent jobs are running longer than expected.

The need for this solution came from the fact that despite my having alerts for failed agent jobs, we had a process pull a Palin and went rogue on us. The job was supposed to process a cube but since it never failed, we (admins) weren’t notified. The only way we got notified was when a user finally alerted us and said “the cube hasn’t been updated in a couple days, what’s up?”. Sad trombone.

As Devin mentioned in his post the code/solution below is very much a version 1 product so if you have any modifications/suggestions then have at it. We’ve documented in-line so you can figure out what the code is doing. Some caveats here:

This solution has been tested/validated on SQL Server 2005 (SP4) and 2008 R2 (SP1).

To setup this solution, create an Agent job that runs ever few minutes (we’re using 5) to call this stored procedure

FYI, I set the mail profile name to be the same as the server name. One – makes it easy for me to standardize naming conventions across servers. Two – Lets me be lazy and code stuff like I did in the line setting the mail profile name. If your mail profile is set differently, make sure you correct it there.

Thresholds – This is documented in code but I’m calling it out anyways. We’ve set it up so that any job whose average runtime is less than 5 minutes, the threshold is average runtime + 10 minutes (e.g. Job runs average of 2 minutes would have an alert threshold of 12 minutes). Anything beyond a 5 minute average runtime is controlled by variable value, with default value of 150% of average runtime. For example, a job that averages 10 minute runtime would have an alert threshold of 15 minutes.

If a job triggers an alert, that information is inserted into a table. Subsequent runs of the stored procedure then check the table to see if the alert has already been reported. We did this to avoid having admins emailed every subsequent run of the stored procedure.

CODE (WARNING: This code is currently beta and subject to change as we improve it)