Introduction

This article briefly describes how to update the operations block by reading the data being modified. This behaviour can be altered using row versioning. The advantage of using row versioning is greater concurrency, but there are also side effects which are described in this article.

In many cases, I've noticed that applications may use a Read uncommitted isolation level to overcome locking related problems (typically, the amount of time used for an operation). Row versioning significantly reduces the need to use the Read uncommitted isolation level. When the Read committed isolation level can be used without spending too much time on the operation, the correctness of the results is significantly higher.

This behaviour is similar to what is in Oracle databases, although the implementation is somewhat different.

Normal Behaviour

First, we need a simple database for testing, and a table with a row. In the simplest form, this can be created in an existing SQL Server instance using the following command:

Now, when an application starts modifying the data, when done properly, modifications are wrapped inside a transaction. To simulate this, we modify the test row and leave the transaction pending.

BEGIN TRANSACTION;
UPDATE Test
SET Column1 = 'Modified value';

If the situation is observed from the database side using sp_lock, we can see that an exclusive lock is acquired on the row being modified. Also, a few intent locks are needed at a higher level, but the main concern is in the row lock. The output would look like this:

So far, no problems. Now, when another connection tries to read the same data, it actually tries to get a shared lock on the row. Because an exclusive lock is already granted to the row, a shared lock cannot be given. In this situation, the connection requesting the shared lock is placed in a queue to wait for the lock.

SELECT * FROM Test;

When we use sp_lock again to see the situation, the wait state is seen like following:

This situation remains until the transaction from the first session is committed or rolled back. When the transaction ends, exclusive locks are removed, and after that, shared locks can again be granted, and the operation for the second session can continue. In a busy system, the situation in the queues can be like in a gift shop on the day before Christmas.

Using Row Versioning

Row versioning is a database setting which can be modified using the ALTER DATABASE command. To enable row versioning, set READ_COMMITTED_SNAPSHOT to on:

ALTERDATABASE RVTest SET READ_COMMITTED_SNAPSHOT ON;

When executing the command, the database can be in multi-user mode, but there must be no other connections in the database concurrently. If there is, the command doesn't return until all other connections are closed.

Now, when the behaviour is changed, let's have a look at the previous situation. First, one session modifies the test table again:

At this point, nothing is changed. The session has the same locks as previously, but when another session executes a SELECT on the same data, we can see the difference:

SELECT * FROM Test;

Now, the query returns immediately, giving the following result:

Column1
-------
For testing

Most importantly, we see that locks are honored. We don't see the uncommitted data as would happen with the Read uncommitted isolation level. Instead, the result we see is the last committed state of the row.

So, what actually happened? When the modification was done (UPDATE statement), SQL Server took a copy of the data before the data was changed. This copy was placed in tempdb. When the row was queried, the database engine noticed that the row has uncommitted modifications, and based on transaction sequence numbers, it read the original data from tempdb.

Trade-offs

Since there are no free meals, this behaviour has some trade-offs. The most significant of them are:

Tempdb space usage. Because versioned data is stored in tempdb, this database must have enough space for all modified rows in all databases in the same instance which have row versioning on. This may radically increase tempdb usage. Space used in tempdb is not freed immediately when it is not needed anymore. Instead, there is a separate thread which cleans up unneeded data from tempdb once in a while (normally, once in a minute).

Amount of I/O increases. First, the data must be placed in tempdb, and when queried, it must be fetched.

Amount of CPU increases because of management operations for this feature.

Data rows consume more space. Each row must have info about the transaction sequence number along with a pointer to the versioned row.

For LOB-fields, each data fragment has 40 bytes less room because of increased header information. For this reason, the database can grow significantly compared to earlier versions of SQL Server.

Conclusion

Row versioning is easy to set up, and it enables much higher concurrency in an environment where the same data is modified and read simultaneously. It uses more resources, and therefore an existing server setup may be inadequate. When used correctly, it will have a very positive impact on the overall throughput of an application.

If the application is designed based on the earlier fact that reading was impossible while an active transaction was modifying the data, row versioning shouldn't be used, or the application must be redesigned where needed.

Share

About the Author

I've been a programmer since mid 80's using languages like assembler, C/C++, PL/I (mainframe environment), pascal, VB (I know, I know, no comments please) and C# and utilizing different techniques and tools.

However I'm specialized in databases and database modeling. Mostly I have used products like Oracle (from version 6), SQL Server (from version 4.2), DB2 and Solid Server (nowadays an IBM product).

For the past 10+ years my main concerns have been dealing with different business processes and how to create software to implement and improve them. At my spare time (what ever that actually means) I'm also teaching and consulting on different areas of database management, development and database oriented software design.

Good article. I'm interested in the tradeoff of querying performance vs. the added tempdb I/O. Do you happen to know of any real-life situations where enabling this option will definitely be beneficial?

The question is a bit hard to answer in brief since there are so many implementation specific factors that affect this.

I've (for some time now) preferred row versioning over the normal locking and in all situations I've encountered this has had a good impact on the system.

I would say that in a system where (system widely) data is mostly modified and simultaneous read access is rare, there's no need to use this, since it's mostly overhead. However, in many system there are 'hot spots' which are constantly updated and queried. These areas may often present the bottleneck of the system.

Also in many databases I've encountered, there may be time consuming reporting done while normal updating occurs. Row versioning helps these kind of cases also.

If possible, try this in a test environment with real test cases (along with stress test). This helps you to estimate the I/O amount and space usage. If you also run the tests with this option on and off, you'll see some guidelines if it's benefitial or not.