Database Administrators Stack Exchange is a question and answer site for database professionals who wish to improve their database skills and learn from others in the community. It only takes a minute to sign up.

We have some consultants working on expanding an inhouse data warehouse. I was doing a code review and ran across this pattern in all of the load procs:

MERGE [EDHub].[Customer].[Class] AS TARGET
USING (
SELECT <columns>
FROM [dbo].[vw_CustomerClass]
WHERE JHAPostingDate = @PostingDate
) AS SOURCE
ON TARGET.BankId = SOURCE.BankId -- This join is on the business keys
AND TARGET.Code = SOURCE.Code
WHEN NOT MATCHED BY TARGET
THEN
<INSERT Statement>
WHEN MATCHED
AND TARGET.IsLatest = 1
AND EXISTS (
SELECT SOURCE.[HASH]
EXCEPT
SELECT TARGET.[Hash]
)
THEN
<UPDATE Statement>

The gist is, if we have a new business key, insert but if the business key exists and the hash of the attributes don't match our current row then update the old row and insert a new one (later in the code). It all works fine but I paused when I got to this code

AND EXISTS (
SELECT SOURCE.[HASH]
EXCEPT
SELECT TARGET.[Hash]
)

It seems overly complicated compared to SOURCE.[HASH] <> TARGET.[Hash]. The EXCEPT will do an accurate NULL comparison but in our case hashes will never be NULL (or we have bigger problems). I want our code to be easy to read so that when someone has to maintain it, it doesn't confuse. I asked our consultants about it and they speculated that it might be faster because of set operations but I decided to write a simple test (test code below).

The first thing I noticed was the EXISTS/EXCEPT had a more complicated query plan but that's not always bad

I ran each select client statistics on and the <> join yielded total execution time of 12,000 vs 25,000 with the EXISTS/EXCEPT. I want to take this to our consultants with the request to refactor that statement but wanted to get feedback here on:

2 Answers
2

I don't like ISNULL with sentinel values, it requires picking values that can't ever legitimately appear in the data now or forever after and personally I find expressions containing these more difficult to reason about.

For your test rig I tried four different ways of expressing the query and got the stated results.

SELECT COUNT(1)
FROM dbo.l
CROSS JOIN dbo.r
WHERE r.hash <> l.hash
OR ( r.hash IS NULL
AND l.hash IS NOT NULL )
OR ( l.hash IS NULL
AND r.hash IS NOT NULL )

All good points - I might run some live data tests this week on some of the larger tables. The varbinary(8000) is what hashbytes() outputs by default (at least for SHA2_256 which they are using)
– Bob ProbstApr 15 '18 at 17:25