Proactive Database Administration

Code Review

In software development, code review is a simple and effective way to reduce the number of bugs that creep in. Merely having a second pair of eyes look at a piece of code can often reveal logic that was implemented incorrectly or weakly, or misses use cases never considered. (Test-driven development is a separate discussion entirely; T-SQL unit testing is not the easiest thing in the world to do, and compromises sometimes have to be made, usually due to time constraints or politics.)

There are many good reasons why code review is a good practice, and I’m not going to get into that here. You can use Google as well as I can, or you can start by reading some of these questions on Software Engineering Stack Exchange. The purpose of this post is to consolidate many of the different things I look at when doing code review, or even when I’m looking over my own code. While this won’t be a complete list, and the items will vary somewhat depending on your company’s environment, it should be a good start.

In no particular order:

Syntax, including version-specific language use. This seems really obvious, but it can be easy to slip up. For example, if you normally do your development on a 2008+ server and the application still has to support 2005, it’s easy to write something like DECLARE @i int = 0 and only have this get caught during testing. (Testing is done against all versions of the database engine that are supported by the application, right? …right?) In terms of outright syntax, pay particular attention to T-SQL that’s contained in string variables, meaning dynamic SQL, or SQL statements embedded in client applications, as these aren’t validated until they are run through the containing code. Do all INSERT statements have a column list, and are all required columns specified?

Data/business logic protection. This could apply to either designing tables (which I do recommend code-reviewing, too), or when writing code to manipulate data (in the tables or otherwise). Are there enough constraints on the tables to prevent invalid data from entering the system? Are there any logical gaps in a piece of code? Does the piece of code accomplish the prescribed task? If in doubt, ask the developer what was intended. Sometimes logical gaps can be okay if a condition is rare, and the code is appropriately guarded. These same ideas apply to things like procedure parameters, particularly if any of the parameter values come from application user input.

Dynamic SQL. Follow through all logic paths to make sure the statements are being concatenated correctly for both syntax and potential injection attack vectors. Make sure the statements are parameterized appropriately.

Code Cleanliness. This is a bit tough to define. I personally don’t expect strict adherence to a style guide, for example, but I do expect code to be legible, and make appropriate use of indentation, line breaks and comments. Everyone has their own natural style — and even that can change over time — so I think giving people more or less free reign over style is fine. Again, as long as the code is legible to others, because that is the most important thing.

Version- or edition-specific feature use. This shouldn’t come up, as it should have been decided during the design phase, but at the end of the day, a developer can check anything in to the version control system.

Use of unexpected methods to accomplish a task. Does the code use a Rube-Goldberg-like approach to accomplish the task? Are cursors being used when all that’s necessary is a slightly more difficult to write set-based operation? In these types of cases, I will always go back to the developer and find out why this kind of approach was taken, and what was intended to be accomplished. Most of the time, this kind of situation provides a great learning opportunity.

Query plan reuse. Does the code reuse (or not reuse) query plans as expected? Pay particular attention to dynamic SQL: for example, there can be subtle string concatenation bugs where two generated statements (via different code paths) only differ by whitespace, but this is enough to produce a separate query plan. Sometimes dynamic SQL is used to force the creation of an alternative plan for the expected data size — does this work as expected?

Performance. Does the code meet the performance requirements, if any, and/or does it execute in a “reasonable” amount of time with a “reasonable” amount of resource usage? Does the code use table variables or temporary tables, and how does this interact with the query plan(s)? Does the code produce efficient query plans that will scale? It’s important to test code on a real data set to find out these things. Some of this is technically a “testing” activity; however, I would say it’s unlikely a testing department will have the technical skill required to determine if the code is doing the right thing. Reading and understanding query plans is a huge topic in itself; in short, look for inefficient operators that are unexpected, such as table scans (particularly as the bottom operator in a nested loops join), huge row counts (thick arrows), missing indexes, etc. Speaking of indexes, determine if the code would benefit from index changes, even if query performance is currently acceptable. Even if you don’t make those changes now, record your findings, as it could be valuable information to have later if performance does become a problem. This could potentially save an incredible amount of time and effort doing a workload analysis.

Deprecated features. This is fairly broad, as deprecation can happen for a wide range of things, from major features (i.e., database mirroring) to minor T-SQL syntax (i.e., using HOLDLOCK without surrounding parentheses). That said, for every new release of SQL Server, it’s a good idea to thoroughly read the official list of deprecated features, and make sure no new code is using any of it unless absolutely necessary (and in that case, there needs to be a plan to modify the system to remove the deprecated feature usage). I have a script on the Scripts & Code resource page that dumps the internal performance counters that track usage of deprecated features, which I think may be more broad than the official reference list. This method is great if you want to find out what your application is really doing in production, so you can proactively fix it long before it becomes a problem.

That’s a pretty good start! If you think I’ve left out something important, let me know in the comments and I’ll add it to the post, as I want this to be a good reference guide.