Coleman

Batching Large Updates in SQL Server

Sun, Dec 21, 2014

Note: The following is a simple, pure T-SQL approach to batching updates (or any kind of DML operation) in SQL Server. Consult your friendly database admin or system admin to make sure this solution is appropriate for your system.

Let’s say you’ve got a large number of rows in a table that need updating, but you don’t want to update them all at once in a single UPDATE statement.

Why would you want to do avoid using a single statement? One reason is that doing a large update on a live SQL Server database can cause significant overhead for SQL Server Replication. If you really need to do an update on an important server, and bringing down replication temporarily is not an option, using T-SQL to batch your updates might be want you want.

So let’s see how to do it, using a fake “Products” table as an example. The product_id is our primary key and has data type int. Our data currently looks like this:

product_id

type

price

1

A

10.99

2

A

10.99

3

B

10.99

4

B

10.99

Our desired outcome is to update every price to 12.99 for type A products, and 8.99 for type B products. After we’re done, our data will look like this:

product_id

type

price

1

A

12.99

2

A

12.99

3

B

8.99

4

B

8.99

First, we need to backup our data. If we screw up, we’ll be able to restore our original state using this backup table.

SELECT *
INTO zProducts_backup
FROM Products p
WHERE p.type in ('A', 'B');

Now we can do the actual work. In a single SQL script, we can define the following.

What’s going on here? First, we create in-memory table called @products to hold all the id of rows we want to update. Then, we populate that table with the ids of all type A and B products (this statement look almost identical to our initial backup). You can think of @products as a queue of work. Next, we create a table @batch which will consume rows from @products, 10 at a time.

With these tables declared, we step into a WHILE loop with WHILE 1=1. We immediately delete everything from @batch, but on the initial loop @batch will be empty anyways.

Next, we insert 10 rows into @batch, selecting them off the top of @products.

If this insert yields affect more than 0 rows, the IF (@@ROWCOUNT = 0) statement will evaluate to false, and our script will keep running. If the opposite happens, that means we consumed all the work from @products, so we can BREAK out of our loop (we’re done). Read more about T-SQL @@ROWCOUNT here.

After that, we finally get to the actually UPDATE statement, which operates on the “real” Products table. I use a CASE statement here, but any method that works will do. Notice that only the rows currently present in @batch will be updated in this run of the loop.

Once the update is done, we delete the recently-batched records from the in-memory @products table. The loop starts over again, and we immediately empty out @batch. The loop will continue until there’s nothing left in @products. Then your work is done!

As a final note, long-running queries should be run on a machine that is guaranteed to stay online for the duration of the operation. So try not to run these things on your personal laptop.