Introduction

As a software developer, I am often called upon to write SQL queries to retrieve data from the database. Since being the DBA is part of my role, I have found it quite natural to do so. However, I have come across a number of developers who do not understand how to write an efficient T-SQL command. This article is designed with the developers in mind but it will be equally useful for database developers who want to brush up on their skills. This particular article focuses on the basics of how to retrieve data from a database.

Purpose

Throughout this article I will cover the basics of how to access data from a SQL database as well as my suggestions for how to best use each of these tools. Most of these techniques should work on any modern version of Microsoft SQL. I will do my best to point out when commands are only available in certain versions of SQL. All queries have been designed to run against the AdventureWorks database. Microsoft provides this sample database for free on their website.

Please note that it is outside the scope of this document to really discuss in depth how these queries should be called (from code directly or through stored procedures). I would say that, when in doubt, use a stored procedure.

T-SQL Test Platform

Included with this article is a very simple test platform that I developed to help you see how these commands would run. With this, you can play with the different options and mix and match the different commands in a safe manner. The tool is not meant to be production-ready code. It is simply a way to quickly execute these commands without needing to open a SQL command window. I’ve included the EXE for those who don’t want to load the project and build it. This project also comes loaded with all of the samples that I give you in this article.

Standard Commands

The first thing you learn when being taught how to access data in a database are the following commands: SELECT, FROM, WHERE, ORDER BY. These command are the foundation on which most queries are built. SELECT tells the system what columns you want, FROM says where to get the columns, WHERE filters the results, and ORDER BY puts the results into the desired order. Here is a basic example of this type of query:

Best Practice: The ORDER BY command can order in ascending or descending order. This is specified by an ASC or DESC after the column name. If you do not specify anything, the column is sorted ascending. However, it is good practice to write this out instead of leaving it implied. Best Practice: While SELECT * will give you every column, avoid this method if at all possible. By specifying which columns you want, you will reduce the bandwidth needed to transfer the results to the client. Key Point: You do not need to have a column listed in the SELECT statement in order to use it in the WHERE clause. It simply needs to be a column in one of the referenced items in the FROM statement. Key Point: Query sections get evaluated in the following order: FROM, WHERE, SELECT, ORDER BY. This means that if you create an alias (see below) in your SELECT statement, the WHERE statement will not be able to reference it (since it hasn’t been evaluated yet) but the ORDER By statement can use it. Key Point: Query results are not ordered unless specifically given an order. It may seem like they are always returned in the same order without an ORDER BY statement but this is not reliable. If you want your records returned in a specific order, you must specify it.

Filtering Results with WHERE

We already touched on how to use a WHERE statement above but I wanted to go into a bit more depth on what you can do and, more importantly what you should and should not do using a WHERE statement. The WHERE statement at its best utilizes the table’s index in order to quickly limit what data is returned in the result set. For example, this query takes advantage of a nonclustered unique index on the Name column:

What if, however, we wanted every bottom bracket in our database, not just the HL Bottom Bracket? In that case, we could use the LIKE keyword instead of the equals in our WHERE statement and use the percent (%) sign to represent a multi-character wildcard. That query would look like this:

That gives us the results we wanted but at a fairly significant cost. The first query used an Index Seek (very efficient) while this second query uses an Index Scan (inefficient). The reason for this is because we put the wildcard at the beginning of the statement. Sometimes you cannot help this. For example, in this query it would be very difficult to find all of the items with the text “Bottom Bracket" in their text. Moving the wildcard anywhere else in the text will allow the system to do an Index Seek again.

Best Practice: When you have multiple statements in the WHERE section, use parenthesis to ensure the proper order of execution. For example, (Name=’Wrench’ AND Color=’Blue’) OR Color=’Black’ is different from Name=’Wrench’ AND (Color=’Blue’ OR Color=’Black’). Best Practice: As we learned above, avoid putting a wildcard character at the beginning of the search text. This will dramatically improve the performance of your query. Best Practice: Avoid performing functions on the left side of a WHERE statement. For example, the following would be functionally equivalent to the query we did above but it would actually perform even worse than our existing poorly performing query:

Limiting Results with TOP

If you only want a certain number of records to be returned, you can use the TOP command to specify either the number of records or the percentage of records to return. Note that this does not filter the results, it just grabs the specified number (or percentage) of records off of the top of the query and returns them. This command can be useful in a number of ways. One way this can be useful is if you want to return only the top five items from your inventory based upon their cost. You would simply order the query by cost in descending order so that the most expensive product is on top and then you would select only the top five records. Here is an example of doing this:

Key Point: The method I showed you of finding the most expensive items does not take into account items with the same price. In my example, all five items that return have the same Standard Cost. If I only wanted the top three, which ones would it give me? The answer is the first three it found. This order might change without notice unless you had specified a second column to order by as well.

Linking two (or more) tables or queries together with JOINs

Linking multiple tables or queries together using a JOIN statement is a large topic on its own. While I could cover it in depth, there is already a great article on CodeProject that covers the issue very well. Here is the link: Visual_SQL_Joins.aspx. It even has pictures.

I will just add my comments and ideas on how to best implement joins. The one thing you will notice especially here but also throughout the rest of this article is that I advocate being explicit in whatever you do. Don’t assume the next person to come along will know all of the implicit rules. Besides, it usually makes the query more readable to write things out fully.

Best Practice: Using just the keyword JOIN implies an INNER JOIN. However, write it out. Don’t make the next person to come along remember if it is INNER or OUTER. Best Practice: You can specify how the two tables are joined in the FROM statement or in the WHERE statement. For the sake of clarity, specify this information in the FROM statement. When you put it in the WHERE statement, it confuses the join information with the filtering information. Best Practice: The keyword OUTER may be omitted when you specify the type of OUTER JOIN (LEFT, RIGHT, FULL). This may be more of a preference since it becomes obvious that it is an OUTER join but include the keyword OUTER.

Renaming items using the AS keyword

Sometimes a table has a column name that isn’t what you would want it named. It might be cryptic (GL00101 – this is an actual column name in the Microsoft Dynamics Great Plains database), too long (TheTotalNumberOfProducts), too generic (Name), or just not to your liking. Whatever the reason might be, you can rename the column by simply adding the AS keyword followed by the new name (as long as the new name does not conflict with an existing column in your query). Here is an example:

I limited the results to three since this is a demonstration query. Notice that I renamed the Name column to be Product_Name so that it was easier to read. It would make even more sense if we had a query that contained a customer’s name as well as the product name.

I intentionally made this query a bit tricky just to show off what we have already discussed. Notice that the WHERE clause references the Name column while the ORDER BY clause references the Product_Name column. Remember this is because the WHERE clause is evaluated before the SELECT while the ORDER BY is evaluated after the SELECT. If we wanted to use the name Product_Name in our WHERE clause, we could do so like this:

Basically we need to create the query (without limits) inside of parenthesis and use that as the table in the FROM statement. Notice that we named the entire nested query as Prod. You need to name the query something so that you can reference it. The end result of all of this is the same except that we could use the column name Product_Name in our WHERE clause. When doing this, be careful to place items in the right query. For example, I put my “TOP 3" limiter in the outer query instead of the inner query because otherwise I would have gotten bad results (we need to apply our WHERE to the entire table, not just three records).

Finding ranges of data using BETWEEN and IN

We can use the typical operators like >, <, and = in our WHERE clause to find the data we want. However, there are a couple of extra commands that we can put in our toolbox as well. The first command we will look at is the BETWEEN operator. Here is an example of how to use this operator:

If you look at the execution plan, these two statements actually evaluate to the same plan which means there is no performance gain or loss by using the BETWEEN operator. The biggest thing we gain is a simpler statement. Note that the BETWEEN operator includes the lower and upper ends so if there were an item that sold for 2 dollars it would be included in this list. The same would be true for an item that cost one dollar.

The next operator we want to look at is the IN operator. This operator tells the system to find an exact match to one of the items in the list. The list can have two or more items in it. Here is an example of how to use the IN operator:

While the execution plan does not write out the translation of this query, it executes the same plan for both of these queries. Again, this means that the IN operator does not cause any performance gains or losses. It is simply an easier way to write out our query.

Advanced Topics

I have intentionally not covered some of the advanced topics including grouping of results, temporary tables, parameters and other important T-SQL topics. I intend to cover these and more in my next article. SQL has a lot to offer developers. While it would be great if you could read a good SQL book or two completely in order to get a full picture of how to build SQL statements properly, I intent to give you the condensed version in these articles.

Conclusion

In this article, we have learned how to use the basic commands and operators in T-SQL statements. With this knowledge you can build simple commands to gather data from one or more tables in Microsoft SQL. Along the way we discussed some of the best practices that should be followed as well as a few gotchas that we should be aware of when building our queries. I hope you have found at least something that could be of benefit to you from this article. I appreciate constructive feedback and look forward to your thoughts on how this article could be approved.

Share

About the Author

I am currently a Senior Software Developer at a company in Illinois called DeGarmo. My primary skills are in .NET, SQL, JavaScript, and other web technologies although I have worked with PowerShell, C, and Java as well.

In my previous positions, I have worked as a lead developer, professor and IT Director. As such, I have been able to develop software on a number of different types of systems and I have learned how to correctly oversee the overall direction of technology for an organization. I've developed applications for everything from machine automation to complete ERP systems.

I enjoy taking hard subjects and making them easy to understand for people unfamiliar with the topic.

When I first read "... you create an alias ..., the WHERE statement will not be able to reference it", I thought "totally false", then I thought I'd better check what you mean. Your meaning is totally true, but there are more places where an alias can be created. Suggest changing to "... you create an alias in the SELECT clause ..."

You said "Avoid performing functions on the left side of a WHERE statement." Suggest changing to "Avoid performing functions in a WHERE statement." The impact of putting a function in a where clause is equally felt if it is either on the left or right.

I respectfully disagree with always using "INNER" in a JOIN. The JOIN ALWAYS wants to do an inner join. You HAVE to do something special to NOT get an inner join. On the other hand, even though OUTER is also optional, I tend to agree that a best practice is to always say OUTER when you mean outer. Even if you have someone who can't keep the concept of inner and outer straight, I don't see how they could forget when they join a group following this practice. (I will follow BP "standards" for any group I join. I will not fight the use of INNER where that is already a standard. I'll not fight NOT specifying outer if that is the standard. I'll only put my 2 cents in when the standards are being set.)

I rarely encounter a group that consistently uses the AS keyword where it is implied. Because of this, I also rarely use it, but I can definitely see the benefit of consistently using it.

Thanks for the comments and suggestions. I'll go through them in order with my thoughts.

I think that stating that WHERE clauses cannot reference aliases in the SELECT clause is a good clarification. As I stated, the FROM clause is the first to be evaluated so aliases there can be referenced in the WHERE clause. I implied that I was discussing SELECT aliases but it wouldn't hurt to be more clear about it.

Actually, there are times when performing a function on the right hand side of the WHERE clause is the lesser evil. There can be a performance difference. I won't say it is great idea to put a function in the WHERE clause but the performance will be different depending on which side you put your function.

Well, I respect your opinion on the use of INNER. My philosophy, though, is that it never hurts to be explicit but it can hurt to be implicit.

I appreciate your feedback. Even if we don't always agree, I am happy to hear how others view these topics. I will put that one change on my change list for an upcoming modification of the article.

I don't see how the performance for
"WHERE 'Bottom Bracket' = SUBSTRING(Name,4,14)"
would be any more efficient than your where example. Assuming your Name field is indexed, it would still go through the entire index list. (You probably can come with an example, its just not a good idea for someone new to SQL using functions at all if they can help it.) You are also using an example where you might not be able to avoid using the function in the where clause. Neither where clause examples would pick up the same set of names in all situations because the like would pick up 'Peruvian Bottom Bracket' and the substring would exclude it and vs. versa for 'xp Bottom Bracket for the USA'

To get both selection criteria applied, the best performance might come from:

SELECT *
FROM (SELECT Name,StandardCost
FROM Production.Product
WHERE Name LIKE '% Bottom Bracket') AS a
WHERE LEN(Name) = 17

That is because the like goes through the entire table and the second assumes you don't want the names where the "Bottom Bracket" text could be repeated at the start and then at the end of the string. You would have to use the SUBSTRING query in the latter case.

There are two points why you don't want functions in your where clause. It's appropriate not to list them in the article because this is a beginner's article, but they are 1. Functions are executables, it costs CPU(time) to execute them. 2. They are not predetermined values, the entire set of values in the table need to be evaluated against the function. (Assuming no other criteria has eliminated the table rows first.)

The place where I saw this first, it said don't put functions that produce a constant value in your where clause. By far, the majority of times, I see this, rather than functions that produce variable values. When it does come up, I'm usually hard pressed to figure out any way to avoid putting the function in the where clause.

Because the above LIKE example would be evaluated first, the above example could use the AND to the function in a single select statement and the performance would be the same.

Note that selecting all the results in a subquery and applying a where clause outside the subquery like in another of your examples would negate any advantage an index on the Name clause would give you. The performance would be the same as if a Name index didn't exist on the table.

Thanks for letting me know about this. I have submitted an email to CodeProject to get these items corrected. The broken links will be a fix on their end. As for the description, that was my bad. I accidentally kept the description from a previous article (I was re-using the header and forgot to clear it out first). I appreciate you letting me know.