Top 10 T-SQL Code Smells

I enjoy forums, and lately I've been really enjoying Stack Overflow. The few, but important enhancements the Stack Overflow team has made to the forum notion, such as easily ranking solutions, separating solutions and commentary into separate channels, and so on, make it much more usable and interesting than a traditional forum. In an environment like SO it's interesting to see all the posters' varying skill sets all mashed up together, from people you can see are adapting what they know to a new problem, to obvious newbies, to those who clearly are not "getting" some concept. I always learn a lot both by reading and by posting the odd solution.

Then again, there are multiple solutions to every problem. Many, maybe hundreds, of solutions are correct in the narrow sense of producing the right result. Of those many correct solutions, though, some stand out as clearer, or more elegant, or more flexible, and hopefully one wants to surface those even above the ones that are just correct. This is something I learned when I was in Architecture school [too many] years ago. A complex problem might have a million solutions, a thousand good solutions, and a handful of really exceptional solutions.

"Code Smell" has become a fashionable term for this, but you can call it whatever you like -- best practice, symptoms of design problems, refactoring opportunities. So, what is it that makes the difference? In SQL, code smells seem to emanate mainly from code that:

Defeats the optimizer. The query optimizer does a good job most of the time, though it can make errors occasionally. However, un-optimizable code -- code that defeats the optimizer or prevents it from working at all -- seems like a bad idea right across the board.

Defeats the transactional properties of the server. Same idea: transactions and ACID are good, and you already paid for them if you have SQL Server, so why just turn around and circumvent them?

Defeats other properties of the SQL language that would be useful, such as closure.

Assumes every operation will always succeed.

Bends SQL to some task for which it is not at all suited.

Works around or optimizes for a problem that doesn't actually exist.

In any case, these are some things I watch out for, and when I find them, I try look deeper and see if a problem is lurking about. There could be a problem, or perhaps not. In no particular order:

Loops. Loops of any kind, cursors or otherwise, are definitely code smell in SQL. Sometimes they are genuinely needed, sometimes they provide a legitimate workaround for a problem that really can't be done with sets, but very often they are just a nice Update or Select statement waiting to be written. In C# iterating over objects is normal. In SQL iterating over rows probably means you are doing something wrong. (There are some exceptions, like the well-known Running Totals problem, etc.)

Loops basically defeat the optimizer. If you have a nice 32 core box with 128 GB of RAM (whatever) the optimizer would happily, automatically rewrite and parallelize many queries for you, at no charge. Explicitly looping means you've probably just blown the opportunity to use that. WHILE is not really better than a cursor; that's a myth. This gets filed under Defeating the Optimizer.

INSERT ... EXEC is a workaround that indicates the stored procedure probably ought to have been a view (or just a plain ol' select statement, or derived table) and not a stored procedure. File under both Closure and Defeating the Optimizer. By chunking a process up into nested procedures and temp tables, you limit the opportunity to optimize across the whole operation. Unless you have some ultra-performant system and you need all the esoteric features of forced plans around stored procs for your select statement, and most people don't, delivering results from views gives a lot of benefit. Views are "expanded" by the optimizer, and the thing that they return is defined as a predictable, single result set. They can easily be nested, while procs can't. It's what views were designed for right from the beginning.

CREATE PROCEDURE mySproc AS BEGIN SELECT ... FROM ... WHERE ... END.

Procs that have no other purpose than to return a result set should probably be views. Closure again. The reason is that the "shape" of what is returned from a stored procedure is undefined. It could be any result set, or several, of any form. I have a hunch this is why the old system procs like "sp_help" are being turned into Dynamic Management Views as we move to new versions of SQL Server. And thank goodness for that. Views still allow you to encapsulate logic and refactor later, and have essentially the same security advantages as Procs as long as your app parameterizes queries to protect from SQL injection. Elaborate chains of IF statements inside a proc are a further indication that something is amiss.

Three-part object names. Four-part object names. For that matter, One-part object names. (Hmm, maybe that's three items.) If your app uses five databases instead of one, and executes cross-database queries, that's a design smell. Are those various databases transactionally consistent? Do they need to be? How would database mirroring or log shipping work for you, when one DB fails over to another box? How yould you change the name of one of the databases, if that name is sprinkled about in your code? Or a linked server name? Multiple databases like this, in my experience, usually want to be either schemas inside one database, if they need to be consistent, or completely separate databases that the app connects to independently, for the purpose of sharding or scale-out, if they don't. A connection string from an app specifies server and database, and it's best to stick to that model, and not hop from one database to another at the SQL Server end of the pipe.

One-part names, with no schema qualifier, will get to be a problem later, after you've combined all those databases into one database with different schemas :-).

Fishing around inside a column for values, as in ... WHERE SUBSTRING( t.myColumn, 1, 3 ) = 'ABC'. Sometimes you need to peek into a column, but it should be the exception and not the rule. Are the values you're storing atomic?

EXEC xp_cmdshell.Even the FBI says this is a bad idea. No kidding. SQL Server is not good at OS operations. If you must go there, at least go with CLR.

Roll-your-own locking. I've seen a few apps that use values within tables to "flag" items as locked. All I can say is, please get to a phone and call the American Society of Wheel Reinventors if you think you're going to write and then maintain a better locking mechanism than the one that's already baked into SQL Server. Seriously.

[Here I mean short-duration, normal, transactional SQL locking, not other types of longer-term "locks" that might be needed for an app to flag items as in progress, or for workflow, etc. Those situations need values in tables AND, incidentally, short SQL transactions just at the moment the data is actually modified. No, I wouldn't want to leave a SQL transaction open for two days while an invoice gets approved...]

SELECT * FROM <whatever>. INSERT INTO myTable VALUES ( 1, 2, 3 ). Depending on the order or number of columns is always a bad idea. Put in a column list; it's worth it. Corollary: figuring out how to re-order the columns in an existing table. The column order should not matter, ever.

Code having no error handling flow control or transactions. If you don't see BEGIN TRANSACTION or BEGIN TRY or even @@ERROR, then you definitely have some issues to investigate.

Code that drops and creates objects in the database as part of the normal operation of an app. A database schema should be pretty static, with DDL (e.g. Drop Table, Create Table, etc.) happening only in installers and patches. If you find a stored proc dropping and recreating tables it's definitely something to check over. Are you losing indexes and statistics when those objects disappear? Permissions? What if the DBA has has to change the indexing to tune for performance?

I also dislike hungarian notation (dbo.tbl_MyTable, dbo.vw_trans), but if I am honest I have to admit that is really a personal stylistic preference and can't be Smell Number 11. Plus, it's a Top 10 list, right? Anyway, if you think I'm right or wrong with these, let me know. I'm always eager to learn from someone else.

["Code Smell," originally from Refactoring: Improving the Design of Existing Codeby Martin Fowler, et. all, has become a fashionable term for correct-but-not-optimal OO code.]

Edit: #7 I intended to refer to roll-your-own transactional locking, distinct from longer-timespan "business" locks or flags that apps legitimately use to mark items as in progress, or for control of workflows, etc. Thanks to commenters for pointing that out.

I think Rule #7 can be justified from an application standpoint. Sometimes "locked" isn't so much as the row is being updated, but maybe someone actually went into that record in the application and is editing it there. Example...

Person 1 opens record 200

Person 2 opens record 200

Person 1 changed Name

Person 2 now has stale data in their form and changes Address

Person 1 saves changes

Person 2 saves changes

Person 1's changes were lost, because Person 2's changes were submitted with the stale data. There are probably better ways of taking care of this, but locking someone out of the record when someone else is in it is a relatively easy way to handle this. When Person 1 opens the record it updates the row and sets it to locked. Person 2 then tries to go into the same record and it says "Sorry, this row is already being edited."

Let me know what you guys think. Like I said, maybe there is a better way of doing that.

I loved reading most of the post, it's a great one. Yet I cannot agree with #9 "Code having no error handling flow control or transactions." Because error handling in T-SQL is a bad joke, it is full of loopholes, in many cases it makes a lot of sense to handle errors in another layer developed in a better language.

Regarding views as the alternative to stored procedures, for me inline UDFs are an even better alternative as the take parameters yet still are macros just like views.

Agreed with Andrew Wickham. SQL Server has no built in support for meta-transactional locks (i.e., at the business process level). This is something that you MUST roll yourself. There are better and worse ways to do it, of course. Blatant plug: You might want to refer to my chapter on the topic in "Expert SQL Server 2005 Development", which shows the methods I've come up with for solving the various problems that can crop up when you don't think through the solution carefully enough.

Thanks, guys, for the feedback. Andrew and Adam - with true "business side" locks I can see your point, and that'd definitely be justified. However, I just got through locating a bug in a vendor-supplied app where they were doing a standard "sequence table," and instead of using a transaction to lock the underlying row, they used a flag in a column. It was slow, and painful to watch, and later, unsurprisingly, became a concurrency bug. So, I still would call the "roll-your-own locks" notion something to watch out for (smell) but not something that should never be done. Maybe I loaded on the sarcasm too thick in the post :-).

The reason is that a view's "output" is the same as a table's "output" -- one result set with known columns -- where a proc can spit out just about anything, or even different things depending on conditions, etc. That means views can be used in a select statement just like tables, but procs, even if they contain only a select statement, cannot.

Alexander - check my understanding: do you mean that transactions and error handling should be used, but that it's more convenient / safer to use a client-side library (C#, java, whatever) to actually write the code? I can definitely see that. In that case, a profiler trace (I think) still shows transaction boundaries and so on, as that client library directs the action at the server, and the error handling is present, just in the app code and not in raw T-SQL. What I was going after in terms of smell is those systems that lack error handling or transactions altogether...

I probably should have said, "if you don't see transactions or error handling in the DB, make certain your app is doing that work instead."

Question about #10. What's your opinion about creating temp tables in a sproc so that you can index it and reuse it in downstream code. I frequently find that this can produce faster results than subqueries.

While can be useful in debugging, if your production code needs to read uncommitted transactions, something is probably wrong.

And re: 7.Roll-your-own locking, I'm not sure what you're advocating. A business transaction is different from a SQL transaction. For example, I might want to "lock" a support ticket while a user edits it, which may take minutes to hours. I don't want anyone else to edit that ticket, but I certainly don't want to leave a SQL transaction open that long.

Bob - agreed that temp tables _can_ be a performance optimization for some scenarios; to me the issue is more around use of INSERT #temp ... EXEC constantly as a default design pattern, or people who always use temp tables out of habit, where it hurts performance. #Tables usually add overhead, and only sometimes make up for that overhead, plus provide net gain in performance.

I think it's better to apply optimizations selectively, when you have a problem or when you can really predict you will have a problem. So I would look at a specific case, see if a basic query would work and scale OK, then use a temp table only if needed and really helpful. In any case, most of these Top 10 are not hard-and-fast rules, they are just symptoms to watch out for.

Al - exactly right. I worry when a SQL transaction was needed and appropriate, but was missing, which is different than the type of business-related "locking" that you and Adam and Andrew describe. It'd be handy if there were a word other than "lock" for that. "Meta-transactional locks? (Adam)

Well stated. I probably use them more often than needed. Often it's to simplify the logic: "Rule of Clarity", etc

The majority of my work is with data warehouse loads where 2 minute execution vs 4 minute execution is trivial.

Side note: I came to database programming (15 years ago?) from OO programing and attacked many problems with loops, I simply wasn't thinking in set operations. Some of the awful code I produced still wakes me up at night. It took about 6 months for set operations to click. Sadly, a power company in the midwest is probably still saddled with my stench!

Excellent! However, "I also dislike hungarian notation (dbo.tbl_MyTable, dbo.vw_trans), but if I am honest I have to admit that is really a personal stylistic preference."? Dude?

Have you ever worked in a database where all the names are common words like "Transaction", "Invoice', and my personal favorite, "[Group]". In my current world, the tables all begin with "tbl", and thank God they do - makes grepping a whole lot easier!

This is an excellent start to establishing list of SQL Code-smells. The reason I like the concept is that it is less prescriptive than the term 'best practice'. A bad smell in the fridge always needs investigation even if it turns out to be ripe Camembert cheese. I must admit that a some of my code has a bit of a whiff about it, but I always bristle a bit when I'm told it does not 'conform with Best-practice'.

Larry, I can understand why you would have a hard time grepping for references to a table named "transaction" or "group" since those words will appear in many procedures that have nothing to do with either table. But what if you used a different but common naming technique, using a collective name for the tables (Transactions, Groups)? A table is a collection of things and very rarely contains only one row. I am not saying that it is better for everyone but it certainly solves your grepping problem. Another is to ensure that you ALWAYS specify the schema prefix when referencing a table. In that case also, grepping for "dbo.Transaction" shouldn't yield any false positives - unless you also have "dbo.TransactionDetails" etc. In which case combining these practices would further ease your grepping woes: "dbo.Transactions" and "dbo.TransactionDetails"...

Seems like a poor excuse to add an otherwise utterly useless prefix to every single table. If it looks and smells like a table, you do not need to name it tblTable.

pjones - respectfully, I hate SQL injection as much as the next guy, but please keep in mind that it's handling parameters correctly that really prevents SQL injection. Using stored procs happens to favor the use of parameters, a fact which seems to have morphed into the idea that stored procs themselves prevent SQL injection. It's parameters that do the work and keep things safe. There are many safe ways to parameterize queries. (I don't want to get into the whole sprocs argument, as it's a discussion that's already completely played out elsewhere, and there's not much new to add.)

Merrill, very true. You can't just move unsafe code into a stored procedure, and it magically becomes "safe". I have often seen the following approach when people are told to get the SQL queries out of their application code and use stored procedures.

My approach to the business workflow type of locking, I typically have a timestamp column that I read and store the byte value when the application opens the "record". Then when the employ saves, I check the record's timestamp column and then compare it to my stored byte array. If they are different, then I warn the employee that the record has been modified outside of their "screen". This gives them a heads up.

On the "business locks" vs "transaction locks", I prefer to use the term "MUTEX" for the business locks, to avoid the confusion on "locks". A flag or timestamp of some sort on a row in a table, indicating to other resources trying to access that data that it's in-use, definitely falls under the heading of a mutex.

Understanding how they work, and what to watch out for (things like rows that never get let go because of an incomplete or failed action), are all covered under the general programming concepts for mutexes.

(Yes, this is a bit late to the game. Ended up here off of an article today on Simple-Talk.com.)