Today I want to look at what things to consider when joining on columns containing NULL values.

Natural, Composite, NULLable keys

Let’s pretend we have an Account table containing the accounts of various users and an AccountType table describing the different types of accounts:

These tables have the unfortunate design characteristics of:

They use a natural, composite key of YearOpened and AccountType

NULL is the valid default for AccountType

Not that either of the above attributes are outright bad, just that we need to handle them appropriately. For example, if we want to bring back a description of each user’s account, we might write a query with an inner join like this:

Those Compute Scalar operators are forcing SQL Server to Scan the indexes and compute a value for every row.

A More Efficient Solution

If using a function like ISNULL hurts the performance of our queries, what can we do instead?

SELECT
a.UserId,
at.YearOpened,
at.AccountType,
at.Description
FROM
dbo.Account a
INNER JOIN dbo.AccountType at
ON a.YearOpened = at.YearOpened
AND (a.AccountType = at.AccountType OR (a.AccountType IS NULL AND at.AccountType IS NULL))

This produces the same exact results while allowing SQL Server to Seek when possible and avoid costly row by row computations:

There are no seeks here since I don’t have any additional filters, but the lack of Compute Scalar operators should be enough to prove the point.

While there are a few more variations that can achieve the same results using different execution plans (writing a query that joins non-nulls and unioning it with a query that selects only the nulls, using a computed column to convert the NULLs to non-null values, etc…) the key to good performance is to choose a solution that will not force SQL Server to compute values for every single row.

My development life would be easier too since I wouldn’t have to think about the kind of data I was storing in my columns; I could define everything as VARCHAR(8000) once and never have to go back to make any changes. Brilliant!

While I was correct about not wasting storage space, it turns out the idea of making every column VARCHAR(8000) is a terrible idea.