From the Archives: Why Snapshot Isolation is so Useful

In this post, originally written by Glenn Paulley and posted to sybase.com in May of 2009, Glenn talks about MVCC and snapshot isolation (which is also part of SAP HANA) and why it is so useful in providing a consistent level of performance as the number of concurrent users increases.

Why is support for snapshot isolation so important? Well, snapshot isolation provides another widget in the DBA’s toolkit, providing reasonable semantics that avoids various types of update anomalies, while not incurring the considerable overhead and contention of serializable query execution strategies.

The term serializability characterizes execution schedules where interleaved interaction of two or more database applications occur as if those transactions were executed serially (one following another). Prior to the idea of snapshot isolation, documented for the first time in reference [1] and subsequently implemented in Oracle, the way to achieve serializable transaction execution was through strict two-phase locking. However, the vast majority of database applications cannot tolerate the lack of concurrency that serializable execution entails; almost always, application developers are willing to tradeoff serializable semantics for improved concurrency (ie. weaker transaction isolation) and snapshot isolation is an important type of weaker concurrency control.

The ANSI/ISO SQL Standard defines isolation levels in terms of anomalies that may be avoided. They are (SQL:2008, Section 4.35.4, pp. 124-5):

The isolation level specifies the kind of phenomena that can occur during the execution of concurrent SQL transactions. The following phenomena are possible:

P1 (“Dirty read”): SQL-transaction T1 modifies a row. SQL-transaction T2 then reads that row before T1 performs a COMMIT. If T1 then performs a ROLLBACK, T2 will have read a row that was never committed and that may thus be considered to have never existed.

P2 (“Non-repeatable read”): SQL-transaction T1 reads a row. SQL-transaction T2 then modifies or deletes that row and performs a COMMIT. If T1 then attempts to reread the row, it may receive the modified value or discover that the row has been deleted.

P3 (“Phantom”): SQL-transaction T1 reads the set of rows N that satisfy some search condition. SQL transaction T2 then executes SQL statements that generate one or more rows that satisfy the search condition used by SQL-transaction T1. If SQL-transaction T1 then repeats the initial read with the same search condition, it obtains a different collection of rows.

It is well-known [1] that the above definitions are lacking in describing all of the anomalies that may occur at isolation levels lower than SERIALIZABLE; the paper by Berenson et al. [1] is highly recommended for DBAs and application programmers alike to help understand the anomalous behaviour that can be encountered at lower isolation levels. However, a common characteristic that ANSI isolation levels 1 through 3 incur is that a writer will block a reader; as with many other DBMS implementations, SQL Anywhere’s concurrency control is based on locking and write locks cause blocking, except for read transactions at isolation level 0 (READ UNCOMMITTED) which offers no correctness guarantees. Moreover, writers always block other writers: SQL Anywhere does not permit “dirty writes” – termed P0 in reference [1] – at any isolation level, due to the ROLLBACK and recovery issues that entail if dirty writes are permitted.

The SQL standard does not specify how these P1, P2, and P3 anomalies are to be avoided; every database management system is free to implement its own solutions. For example, beginning with SQL Anywhere Version 10, SQL Anywhere utilizes intent locks with update-able cursors to prevent concurrent updates; intent locks permit read transactions to process the row, at least until the row is actually modified and the intent lock is upgraded to a write lock.

In addition to expanding on the ANSI isolation levels, reference [1] defined snapshot isolation: the basic idea is for each transaction to “see” a consistent snapshot of the database as of transaction start, and that this snapshot remains unaffected by other concurrent update transactions. Because of some quirks of terminology, many believe that snapshot isolation offers serializable semantics. However, snapshot isolation is not serializable [3] and some researchers have over the years proposed modifications to snapshot isolation so that it does offer serializability [4].

Snapshot isolation as proposed in [1] and implemented in Oracle is based on first-committer-wins; that is, if two transactions modify the same row, which is permitted in this scheme, the first transaction to COMMIT “wins”, and other transactions in conflict will be unable to COMMIT, and must ROLLBACK. In contrast, snapshot isolation in SQL Anywhere is based on first-writer-wins, which (still) forces writers to block writers. This has the advantage of simplifying an application’s COMMIT logic, but the disadvantage of being subject to greater risk of deadlock. However, SQL Anywhere’s snapshot isolation retains the benefits of writers not blocking readers. This permits an application to “see” a consistent state of the database since the start of the transaction, making it straightforward, for example, for a read-only transaction to analyze an entire database without regard to updates made by concurrent transactions. This is a very powerful benefit of snapshot isolation.

Of course, snapshot isolation doesn’t come for free. It is necessary for the database system to construct archive copies of changed data in anticipation of new snapshot transactions. With SQL Anywhere, copies of snapshot rows are managed automatically, written to the temp file (which grows on demand) as necessary. However, though the management impact is near zero, query performance can suffer as snapshot rows may need to be fetched individually from the snapshot row store in the temp file, based on the snapshot semantics of the transaction. The degree of performance degradation depends entirely on the application and its workload, and will be worse with update-intensive workloads. Nonetheless, those same update-intensive workloads may not be perform well with traditional ANSI isolation levels based on locking, because of the lock contention that may be incurred, and the greater potential for deadlock. Hence careful capacity planning should be undertaken prior to deploying such an application in a production setting.

NB. Links to papers are to freely available, public preprint versions.