Technical content about Microsoft data technologies. All opinions expressed are purely my own and do not reflect positions of my employer or associates.

Transaction Isolation / Dirty Reads / Deadlocks Demo

I have just given a talk with my group at work on the basics of transaction isolation, dirty reads (nolock) and deadlocks, just as a way to get eveyone on the same page about the tradeoffs between concurrency on a busy transactional system and accuracy of reports. I essentially repackaged a very good example posted by Alexander Kuznetsov into a simpler script that I could demo to people who are less T-SQL saavy. I hope that he will forgive my re-use of his scenario - I give him full credit for inventing the example :-). Since I'd put together the scripts, I thought I'd throw them up here in case anyone else wants a quick way to demo this. Others have posted similar demos with different emphasis. This particular demo has the advantage (?) that it is very deadlock prone, by the nature of the schema. Both deadlocks and inaccurate dirty reads are readily reproducible.

So, in a test database, I created two tables, one Entities (People) and one Accounts. In the Entities table, I inserted 1000 "random" people. In Accounts, two rows for each person: account 1 and account 2, with a column for the balance. Everyone gets $50 in each account. I then created a script that would perform repeated transactions, moving $10 for one random person from one of his/her two accounts to the other account. The gist of the demo is to run several simultaneous connections with that script, to simulate load on the system, then run aggregate queries on the accounts and show scenarios for totals errors and for deadlocks.

To add a bit of realism, you can associate random names to the Entities table, using this technique.

Then, we have a script that mimics an app connection performing a lot of transactions. The script just moves money from one account to another, so the total balance in the system should remain the same, as well as the balance for every person, making it easy to repro inaccuracies from dirty reads. In a demo, I typically paste this into four separate query windows and get them all running at the same time. They do deadlock periodically, which is a good discussion point. There's logic to continue running after a deadlock:

With that script running (x4), in a fifth separate query you can run the following aggregates one at a time to see or demonstrate what is accurate, what deadlocks, and so on. It helps to fire them several times and see the variation in results:

As many others have explained, the only ones that are 100% accurate are Repeatable Read and higher isolation levels. Read Committed and lower will give variable sums both for the table and for each simulated "person." An exception is if you explicitly use (tablock), which is interesting but probably unsuitable for production :-). The demo also shows the opposite problem, which is the increasing likelihood of blocking and deadlocks as you tighten the isolation level. Advantages of Snapshot are also apparent.

Lastly, it's a decent example to explain deadlocks, and also how to avoid them. In this particular case (this isn't generalizable to other cases) most deadlocks are prevented by locking the two account rows for a person at the beginning of the transaction, instead of having one lock, then a pause, then the other: