Want to keep in touch with the
latest in SQL Server world? Email vyaskn@hotmail.com with
'subscribe' in the subject line

How to perform case sensitive searches in SQL Server?

Note: Information & code samples from this article are tested on SQL Server 2005 RTM (Yukon) and found to be working. Will update the article in case of any compatibility issues.

This is another one of those questions, that gets asked at least once a day in the Microsoft SQL Server programming newsgroups. Over the years, different programmers have come up with different solutions, and I'll be presenting some of those in this article.

A default SQL Server installation is case insensitive, which means that SQL Server will not differentiate between upper and lower case characters/letters. That is, "Oracle" is the same as "ORACLE" or "oracle". Let's see what controls this case sensitiveness of SQL Server. In SQL Server 6.5 and 7.0, you can select a 'sort order', while installing SQL Server. Sort order defines how the characters supported by the code page are compared and evaluated. The default sort order is 'case insensitive'. If you want SQL Server to perform case sensitive searches, you need to select binary sort order while installing SQL Server (6.5 or 7.0). In these versions (SQL Server 6.5 and 7.0), sort order is defined at the server level, and cannot be manipulated at the database level. If you want to change the sort order of an existing SQL Server 6.5 or 7.0 instance, you must rebuild the master database using rebuildm utility. Not a pleasant thing to do.

Terminology has changed, and now a sort order is called 'collation' in SQL Server 2000. Also ANSI SQL:99 uses the word 'collation'. By default SQL Server 2000 gets installed with case insensitive collation. You must change the collation of the server while installing, if you want case sensitiveness. But the good news is, SQL Server 2000 lets you specify collation at the database as well as column level. That means, you can have a default SQL Server installation with case insensitive collation, but a specific database or a set of columns can have case sensitive collation.

Lets see why one would want to perform case sensitive searches. Classic example is, password comparisons. People use a combination of upper and lower case (mixed case) characters in their passwords, just to make the passwords difficult to guess. But there is no point in doing that, if the database disregards the case. To keep the users happy, and their passwords secure, programmers prefer case sensitive comparisons in this case.

Here is another example: Some people tend to get the MOST out of their database. Yes, from a performance point of view, a binary sort order or case sensitive collation will perform faster, but only at the expense of the usability. Think about a user searching your table for the phrase "fish and chips", and not being able to find a match, just because, that phrase is actually stored in the database as "Fish and Chips". If this happens to be a database serving an online catalog, you will lose some sales as well. One way of avoiding this problem would be, to store everything in lower case, and covert user input to lower case before comparing with the table data. Too much hassle, isn't it? I wouldn't recommend this approach. Not worth the performance you gain.

Here are some methods you can employ for case sensitive searching in SQL Server:

All the examples in this article, are based on the following table and data. Requirement is to implement case sensitive search for CustID and CustPassword columns:

--NOTE: I omitted the VALUES list in the above INSERT statements, for simplicity. --This is not a best practice, and you should always include a VALUES list in your INSERT statements.

Consider the following example, that searches the Customers table for a given Customer ID and password. On a server/database with case insensitive collation, this query will disregard case, and will find a row, as long as the values match, no matter what case is used in the input. Since the @CustID 'usa00001' matches a CustID of 'USA00001' and @CustPassword 'theunbreakable' matches the CustPassword of 'TheUnbreakable', the following query prints 'Customer Found!':

Now our goal is to write queries that perform case sensitive comparisons. I will introduce you to some popular techniques, and show you how to make those methods efficient. It is important to note that, if and when you migrate the following queries onto a case sensitive database, it is better to get rid of the additional bits from the WHERE clause, and compare normally.

Method 1: Converting data to binary type before comparison(Works in both SQL Server 7.0 and 2000)

When you convert a character to binary or varbinary datatype, that character's ASCII value gets represented in binary. Since, 'A' and 'a' have different ASCII values, when you convert them to binary, the binary representations of these values don't match, and hence the case sensitive behavior.

The above example will print 'Invalid Customer ID or Password!', because the input provided is in all lower case, while the data in the table is stored in mixed case. Everything looks good. But if you observe the execution plan for this query (By pressing Ctrl + K in Query Analyzer, or by running SET SHOWPLAN_TEXT ON command), you will see an 'index scan'. An index scan is no good from performance point of view, as it means, scanning the whole index -- which is as bad as a table scan.

If you modify the above query as shown below, you will see an 'index seek', which is faster. You will realize this performance difference easily, on a huge table.

Curious, why the first query scanned the index? Here's why: When a column used in the WHERE clause is enclosed within a function (in this case CAST), the Query Optimizer (QP) cannot predict the outcome of the function in advance, and hence it has to scan the whole index and see if there's a match. So, to avoid the problem we added "AND CustID = @CustID AND CustPassword = @CustPassword" to the WHERE clause, and Optimiser made use of the clustered index.

Method 2: Using the COLLATE clause to dictate the case sensitiveness of the query(Works only in SQL Server 2000)

The COLLATE clause allows us to specify a particular collation for an expression. In the following examples, we will use COLLATE to make our search case sensitive. We need to specify a case sensitive collation along with COLLATE. The following example uses the collation SQL_Latin1_General_CP1_CS_AS. If you are working with non-English data, choose an appropriate collation.

The above example will print 'Invalid Customer ID or Password!', because the input provided is in all lower case, while the data in the table is stored in mixed case. But again, the execution plan shows an index scan. We can turn this into an index seek, by following the same trick, as in Method 1: So, let's rewrite the query:

Method 3: Using BINARY_CHECKSUM function(Works only in SQL Server 2000)

I've seen many people using BINARY_CHECKSUM function to perform case sensitive searches. BINARY_CHECKSUM() function accepts input, and returns a checksum value for that input. Though this is a useful function for tracking changes and verifying integrity of data, I don't think it is the appropriate for making case sensitive searches. It works for simple comparisons though. Here's an example:

As you can see, this technique works for comparing smaller strings. But the following script will prove that BINARY_CHECKSUM can return the same checksum value for different input values. This is very bad, especially when it comes to validating user names and passwords. Someone can bypass the authentication/authorization by specifying a password, which is not correct, but produces the same checksum value as the correct password. Try this script and find out why I don't recommend this approach:

Method 4: Changing the collation of the column permanently, so that all comparisons are case sensitive by default(Works only in SQL Server 2000)

SQL Server 2000 lets you specify collation at the column level also. So, you could make your CustID and CustPassword columns case sensitive by default. This saves a lot of effort, as you don't have to employ any special techniques in your queries, to get case sensitive behavior. Here's an example:

Method 5: Using computed columns(Works in both SQL Server 7.0 and 2000)

You could add a computed column to your table, that exhibits case sensitivity. A computed column is a virtual column that derives its value from the existing columns. SQL Server 2000 allows you to create an index on a computed column to make searches on these columns faster. We will exploit that feature in this example:

Method 6: Make use of client side languages like VB or VBScript for case sensitive comparisons(Works in both SQL Server 7.0 and 2000)

Instead of trying perform case sensitive comparisons on the backend, try taking advantage of your client side applications. This may not be appropriate for all situations, but I'm mentioning it here for the sake of completeness. The following code can be used in a Visual Basic application, to perform case sensitive searches: