Friday, October 3, 2008

Back to Basics

It's exceedingly easy to forget almost everything you learned in school, even if happened to pay attention at the time. Inevitably some of those same topics you may have covered in your youth will rear their head in your day job (especially if you took an IT related degree/diploma).

The other day I got into a discussion about database concurrency anomalies and how setting isolation levels can mitigate concurrency issues. When pressed to speak to different kinds of concurrency issues, all I could remember off the top of my head was the "dirty read". This post is an effort to re-cement the topic into my mind (at least for a while).

Concurrency in Databases

Remember that the DBMS has the unfortunate responsibility of fielding requests for CRUD operations for many different clients at the same time. When multiple users access the same data at the same time, sooner or later someone is going to step on another's toes. When a shoving match ensues over who gets to read/write on the same piece of data, the DBMS looks to you the developer to tell it how to behave when a lock needs to occur.

Remember that these problems below don't necessarily affect just stored procedures. It's equally easy to have these affect you from some Java or C# whether you're using transactions or not. It all depends on the isolation level that you asked for (or that you got defaulted to). It's worth mentioning that in both SQL Server and Oracle the default locking behavior is Read Committed. This is also true for most RDMBS'.

Dirty Reads (Uncommitted Dependency)

Transaction1 begins modifying a record.

Before Transaction1 commits the update, Transaction2 reads the change.

Transaction1 rolls back the change.

In this scenario Transaction2 is said to have performed a dirty read. It has read data which was never committed to the data store. You usually don't need to worry about these in SQL Server or Oracle as the default isolation level for these DBMS' is Read Committed (below), which only reads data that has been committed to the data store.

Non Repeatable Read (Inconsistent Analysis)

Transaction1 reads a record.

Meanwhile Transaction2 updates the samerecord and commits its changes.

Transaction1 reads the same record again (in the scope of the same transaction) but the value has changed!

In this scenario Transaction1 could have been running in Read Committed (which reads only records that have been committed to the data store), but still have values change underneath it in the midst of a transaction. To protect yourself against this type of anomaly you need to run in an isolation level of Repeatable Read (or higher).

Phantom Reads

Transaction1 reads a set of records (or maybe aggregates a value off a set).

Transaction2 inserts records into that set.

Transaction1 reads that set of records again and notices that some records have magically appeared.

Phantom reads can happen even if you're running an isolation level of Repeatable Read. You can really only protect against Phantom Reads by running in a isolation level of Serializable (Oracle or SQL Server) or Snapshot (SQL Server).

Who Cares?

The conversation of "what isolation level are these running at?" usually only comes up when you get into trouble or you're planning for big scalability. Normally the default isolation level (Read Committed) offers enough protection to get you through the traditional OLTP application and most ETL jobs you'll end up writing. There are scenarios however when knowledge of these isolation levels can save the day.

Relaxing the isolation level on your transactions can yield a lot more performance for your application by allowing more users to concurrently access data. Running in an isolation level of Read Uncommitted is great for the case that you simply don't care if you end up with dirty reads.

However, if you need gestapo like isolation on a series of actions against the data store (building an important financial report where you need the data to remain consistent while you work with it), an isolation level like Serializable or Snapshot could be just what you're looking for.

Isolation Levels

It's important to note that isolation levels only affect the behavior surrounding reads in SQL Server (and most other DBMS'). A transaction always gets an exclusive lock on any data it modifies and holds that lock until the transaction completes, regardless of the isolation level set for that transaction.

Below is a table of isolation levels available in SQL Server/Oracle, the anomalies they protect against and the locks that they use to enforce the behavior.

Isolation Level (SQL Server)

Dirty Read

Non Repeatable Read

Phantom Read

Behavior

Read Uncommitted

Yes

Yes

Yes

No locks are acquired or honored when reading data. Highest degree of concurrency.

Read Committed

No

Yes

Yes

Application only reads values that have been committed to data store.

Repeatable Read

No

No

Yes

Application puts locks on all records it reads during the life span of the transaction.

Serializable

No

No

No

Application puts locks on all record it reads and prohibits (locks)insertions/deletions of records into the set of records that it is operating on.

Using Isolation Levels

Below are some examples of how to use a specific isolation level in both T-SQL and C#.

Wrapping it Up

Like we mentioned earlier, using the defaults will work 90 percent of the time. However, being aware of how isolation levels fit into the puzzle and how they can help you just might be the piece of information that saves the day. I've seen web applications that frequently deadlock become quite useful, and reporting apps that consistently get the "wrong" data cured by spending a couple of hours looking at isolation levels.

Who knows dependency between isolation levels and database performance? What do you think about experiment, that is described in the following article: http://community.blazemeter.com/knowledgebase/articles/65143-using-jdbc-sampler-in-jmeter-2-6

About Me

Tyler Holmes is a Solutions Architect working in Portland, Oregon. He lives mostly in the MS tech stack and is currently treading the waters of Communication/Collaboration and Business Intelligence with off the shelf/open source technologies.