Geek City: How SQL Server Detects the Correct Versioned Rows

Here is a question I just received from the feedback page on my web site:

I have finished the book <the storage engine> and like it very much. I am now reading <query tuning and optimization>

I know in the READ COMMITTED SNAPSHOT isolation, when a row is being modified in a transaction, it generates an old committed version so another transaction can read it without being blocked.

But I don’t know how SQL Server uses SNAPSHOT isolation to prevent Phantom Read being happening.

In SERIALIZABLE isolation the ranged key or the whole table is locked, but in SNAPSHOT isolation, it can NOT generate any committed version since the row does NOT exist at the moment. So how does it know that the newly inserted data should not be included in the second SELECT statement?

-Tom

Tom is correct in that INSERTS do not generate versioned rows. However, SQL Server is able to keep track of when each change was made under snapshot isolation. Once a database is enabled for snapshots, every rows inserted, deleted or modified gets an additional 14 bytes of overhead added to it. These bytes contain 2 pieces of information. First is a row pointer to the previous committed version of the row, which is stored in the version store in tempdb. This pointer is only used for deleted and updated rows, since there will be no previous values for newly inserted rows.

However, these extra bytes also include a value called XSN, or transaction sequence number, which you can think of like a timestamp for a database. Any database enabled for snapshot keeps an internal XSN value, that is incremented each time any change is made, or any snapshot select is performed. The metadata also keeps track of all active transactions, and what the XSN was when the transaction started. It uses the view sys.dm_tran_active_snapshot_database_transactions for this. So when you are reading data, SQL Server will look at the XSN number in each row, and not return any rows that have an XSN value greater than the XSN value at the time the transaction started.

Comment Notification

Comments

So this should work for indexes too? If query uses covering index, it may never come to data (or clustered index if the one exists) leaf level. So indexes should also keep additional 14 bytes. But for index row size which is usually small, additional 14 bytes per row can have pretty serious negative performance impact. Does it really work this way or some optimization exists that I can't think about?

I just ran a test on a table with a nonclustered index of type money, so it was 8 bytes, plus about 8 bytes of overhead in each row before snapshot isolation. I looked at an index page with DBCC IND and there were 426 rows on a page.

I then enabled the db for snapshot, and updated the money column for all rows in the table. When I looked at the index page again, there were now only 213 rows on the page, and each row was marked as containing VERSIONING_INFO. So yes, this can have a big impact on space usage, and thus on performance.

You didn't think you got snapshot isolation and non-blocking reads for free, did you? There are costs associated with snapshot isolation, in addition to this one.