Great article - we've used MERGE for some time and it reduces the CRUD burden considerably.

However, it should not be used if you are using transaction replication. We've found (and there are others if you google) that replication does not recognise the MERGE statement correctly, and will issue a DELETE/INSERT instead of an UPDATE to the subscribers. This can then collide with any Foreign Key Constraints you may have set up, breaking the replication. Unfortunately this has meant we've had to remove the use of MERGE for a lot of our routines.

paul.barbin (1/16/2013)Curious to hear from others on whether they use the MERGE command in production and what it's performance is like.

We began using it in our unit test scripts to build data for developers immediately following the database build (we're using SSDT 2012). We wanted to use MERGE so the scripts could run even if developers already had data in their database.

So, rather than having an insert script of 10,000 records, we use the Merge command along with the 10,000 rows using the VALUES statement. This is EXTREMELY slow!It takes 30 minutes to populate a small database (5GB) vs 2-3 minutes using TRUNCATE/INSERT.

Paul

I use Merge in production all over the place.

For example, I have a series of databases that power websites. Thousands of hits per day. The data for them gets created/updated/deleted in a CRM, then has to be loaded into the databases for the sites. Pretty complex business rules on the load processes, and servers thousands of miles apart from each other.

Prior DBA had it all set up as Truncate then Insert, but I modified it all to use Merge. Used to be, we'd have "blank" periods a couple of time per day, and employees got used to (a) fielding questions about that from customers, and (b) making changes and not having them show up till (sometimes) late the next day.

I modified it all to use Merge, and set it to run hourly. No "blank periods", and minimal waits for data to go live. Everyone loves it.

However, since Merge has to lock the tables that are being reloaded, and usually needs a full table-lock because of the scope of the action, it would have failed if I didn't combine it with Read Committed Snapshot Isolation. That allows queries from the sites to keep running and not be blocked by the large Merge actions.

Tested without that, and it was bad. (Was in a test environment, so no impact on anyone. But if Merge had gone live without that testing, it would have been worse than the prior "the tables are temporarily empty" issues.)

So, it can be pretty useful, but test against expected concurrent load.

Also tried more traditional two-step update/insert actions, but Merge performed better than those in my case. I've seen it take longer, however, so, again, test vs more traditional methods.

Edit: Truncate+Insert will be faster. It doesn't require any join math between the source and destination. That alone will make it faster. If speed is the only concern, then truncate+insert will almost certainly be faster. However, that option also means you can't have FKs referencing the table, and all the other issues that go along with truncate. But if speed is really all that matters, use it.

@Brian, that delete/insert action in transactional replication is a known bug (or at least it used to be). We had that problem many moons ago and we weren't using the MERGE command. MS said that if the key field in an index is updated, transactional replication views that as a Delete/Insert pair. We had to have special stored procs on the subscriber to handle the error for us. When the Delete sp was called, we checked to see if the delete would fail due to FK constraints, if so, we determined that it was a "false" delete and we'd ignore the delete command. The following command would be the insert and we'd check to see if the record already existed. If so, we'd just update it with the values. That was way back in 2000 and 2005. No idea if that's corrected in the newer versions.

@Gsquared, thanks. I was just curious, I haven't used MERGE outside of this one case. In this particular case, speed isn't crucial, but the time difference is SO great that it starts to become an issue. What we wanted was the ability to run the script regardless of the state of the db and MERGE gives that ability where "kill and fill" doesn't. Good to know that when used in other applications that it is as fast or faster.

There are some very solid, and pretty intuitive, ways to speed up Merge.

For my cross-server ETL, I set up a bulk-copy database that just copies the source tables from the other server. Very fast because it's minimally logged and doesn't incure DTC locks. Then I Merge from that. But that kind of "stage first, load second" isn't specific to Merge. I'd have done that no matter what my final choice was for loading the target tables, since it would apply equally to kill'n'fill, or update/insert (2-step), or anything else.

Another way to really speed up Merge is to make sure the On statement from source to target, is properly indexed, and (preferably) has index-enforced uniqueness for the source. For complex source queries, it can be faster to load a temp table, index that, then use Merge with that as the source.

@Ryan.Polk: Yes, you can do that, but it is not the same. First, as you just demonstrated, it is easy to forget that you need to add the same condition to both the "when not matched by source" and the "when matched" clauses. But on top of that, if you compare the query plans and the io statistics for both options, you will see that my approach with the cte performs better than your alternative. i.e. my cte approach is easier on the eye plus it performs better.

i also found that performance of MERGE is a bit worse than just INSERT/UPDATE - with indexes in place - it is not a huge difference but still noticeable.

Also it is hard to debug/troubleshoot MERGE statements - with INSERT/UPDATEs you can just select the SELECT part with joins and WHERE clause and run it to see results and play with them till you are ready to do actual inserts/updates.

With Merge statement, you would have to cut out the code from it and run on a side or use output clause which you have to type or uncomment.

One pain at least in 2008, that you cannot use results on OUTPUT clause outside of the MERGE (as a inner query) - you can only insert results to a table variable which you need to define and create first.

So after some first excitements our team is back on using old good UPSERTs.