Mobile Development at U2U Consult

Based in Brussels, Belgium, U2U Consult has been offering consulting & development services in the EMEA region for over 10 years. We offer mobile development on all platforms from Apple to Android & Microsoft.

This article describes where, why, and how to use TransactionScope in the Entity Framework 4.0. The proposed best practices apply to a medium to large data access layer, e.g. a DAL that is implemented as one or more WCF services with lots of internal calls. Allow me to start with the conclusions:

Always execute all data base access inside a TransactionScope,

always explicitely specify an isolation level, and

always explicitely open the entities' connection.

If you trust me, implement these rules in all your projects. If you don't trust me, continue reading to figure out why.

Why you should open the connection explicitely.

So you decided to continue reading. Thanks for the confidence .

Default data adapter behavior

Under the hood of the newest 4.0 Entity Framework, the real work is still done by ye olde ADO.NET 1.* data adapter. Such a data adapter needs a connection and a command to do its work. If the connection is open, the adapter executes the command and leaves the connection open. If the connection is closed, the adapter opens it, executes the command, and then politely closes the connection again. When using the Entity Framework, you can open the connection explicitely as follows:

If you don't open the connection explicitely, it will be opened and closed for you. That is convenient, but in a busy complex data access layer the same physical connection will be openened and closed over and over again. I now hear you saying 'What's wrong with that, the connection is pooled anyway, so there's no overhead in opening and closing.' Well, actually the data adapters behavior comes with a price, and I'm sure you will not always want to pay that price.

Performance impact

Data base connections are not created each time you need one. In most cases a connection is fetched from the connection pool. A pooled connection is always first cleaned up by the .NET data client with a call to the sp_reset_connection system procedure. A complete list of that procedure's actions can be found here. The list includes the following:

It resets all error states and numbers (like @@error),

it stops all execution contexts (EC) that are child threads of a parent EC executing a parallel query,

it waits for any outstanding I/O operations,

it frees any held buffers on the server by the connection,

it unlocks any buffer resources that are used by the connection,

it releases all memory allocated to the connection,

it clears any work or temporary tables that are created by the connection,

it kills all global cursors owned by the connection,

it closes any open SQL-XML handles that are opened by the connection,

it deletes any open SQL-XML related work tables,

it closes all system tables,

it closes all user tables,

it drops all temporary objects,

it aborts open transactions,

it defects from a distributed transaction when enlisted,

it decrements the reference count for users in current database,

it frees acquired locks,

it resets all SET options to the default values,

it resets the @@rowcount value,

it resets the @@identity value,

it resets any session level trace options using dbcc traceon(), and

fires Audit Login and Audit Logout events.

If you don't explicitely open the connection yourself, then every other call will be a call to sp_reset_connection.

Please don't start panicking now: SQL Server does all of this extremely fast. The demo application that you find at the end of this article clearly shows that even a high number of calls to the procedure incurs no noticable performance impact. These numerous calls only disturb your monitoring experience with SQL Profiler. The following images show a profiler session where the same workload is executed with and without explicitely opening the connection:

Default behavior

Explicit open

Although every other SQL command is a call to sp_reset_connection, there's hardly any impact on the response time:

But anyway there ís a performance penalty, and you can avoid it.

Distributed transaction escalation

The continuing opening and closing of a (pooled) connection has more than just a small performanct impact when you're running inside a TransactionScope. It can cause a local transaction to escalate to a distributed transaction unnecesarily. When a transaction spans more than one resource manager (database or queuing system) or involves too many connections, the .NET client decides to call help from the Distributed Transaction Coordinator service (MSDTC) in the Operating System. When that escalation exactly takes place depends largely on the version of SQL Server you're working against. The more recent versions sustain local mode longer. I only have SQL Server 2008R2 instances in my network. These instances even allow multiple connections to share the same transaction without escalating to MSDTC. The only way to force an escalation on my machine, is disabling connection pooling in the connection string (by adding Pooling=False).

Escalating to a distributed transaction takes a lot of resources: different services need to initialize and communicate. Here's a screenshot of the demo application, compare the response time for a regular escalated call to the rest (the very first escalation -when stuff needs to be initialized- takes between 4 and 8 seconds):

Of course some of the observed overhead comes from the lack of pooling. Anyway, I believe that the difference in response time is high enough to catch your attention. You can strongly reduce the escalation risk by minimizing the number of physical connections per transaction by explicitely opening the connection and keeping it open until you commit. Here's again a comparison of profiler traces, with and without explicit opening:

Default behavior

Explicit open

Opening the connection brings the response time of the sample workload back to normal:

Why you should use TransactionScope

Local transactions can only escalate to distributed transactions when you're running in a TransactionScope. So you might decide not to use TransactionScope after all. Unfortunately this is not an option. One of the few things that sp_reset_connection doesn't do is resetting the transaction isolation level. I personally consider this a frakking bug. According to Microsoft, the behavior is by design so this bug will never be solved.

If you want tight control over the performance and scalability of your data access, then managing the isolation level is absolutely crucial. There are two ways to control the isolation level in a DAL method:

Explicitely starting a transaction

I see three ways to start a transaction:

using SQL statements BEGIN-COMMIT-ROLLBACK

starting a transaction on the connection using .NET code, or

using TransactionScope.

The good thing about the first two techniques is that their default isolation level is READ COMMITTED, which is a nice trade-off between performance (minimal locking overhead) and reliability (no dirty reads possible). For this reason, READ COMMITTED is the default isolation level in the ANSI and ISO standards. Of course you could specify another level, if required. I will not elaborate on T-SQL transactions. You should only use these inside stored procedures and batch scripts, never in C# programs. Here's an example on using C# to start a transaction on the connection. Inside the transaction, you always know the isolation level (because you set it, or because there is a reliable default):

In the context of a larger application, these first two techniques are too tightly connected to the database: you need a reference to the physical connection to control the transaction and its isolation level. So these techniques are only applicable inside the data access layer. The DAL is not the place where we define what 'a logical unit of works' i.e. 'a transaction' is. That's a decision that should be taken in the business layer, that layer doesn't own the connection. So there are reasons enough to use TransactionScope. Here's how to use TransactionScope. As you see the scope requires no access to the physical connection:

// In the Business Layer

using (System.Transactions.TransactionScope scope = new System.Transactions.TransactionScope(TransactionScopeOption.RequiresNew))

For reasons that I don't understand, Microsoft has chosen the second worst default for TransactionScope isolation level: SERIALIZABLE. [In case you were having doubts: READ UNCOMMITTED would be the worst choice.] This means that if you don't specify a better isolation level yourself, you place a lock on everything you read, and due to lock escalation you'll probably even lock rows that you don't even read. Needless to say you'll end up with scalability issues.

Setting the isolation level in T-SQL

When starting a transaction on the connection, or when opening a TransactionScope, you can (and should) specify the appropriate isolation level. There's a third way to control the isolation level on a connection: T-SQL. This is again a low level technique that requires access to the physical data base connection. Here's an example on how to set the isolation level when using Entity Framework:

The isolation level and its corresponding locking strategy will be applied to all your queries on the connection, whether or not you wrap them in a transaction. If you use this technique, you should not forget to reset the isolation level to its original value at the end of your method. Unfortunately there's no straightforward way to determine the current isolation level for a connection. [Keep reading this blog: I'm working on a solution] If you don't explicitely set the isolation level to its previous value, the new value will remain active on the connection. .NET nor SQL Server will reset it, not even when your code was called from inside a transaction!

Ignoring the isolation level

If you don't take control of it, you have no idea in which transaction isolation level your queries will be running. After all, you don't know where the connection that you got from the pool has been: it could have come from a SSIS-package doing some bulk maintenance (where SERIALIZABLE is the default), or it could been returned from an application that monitors that same SSIS-package (through a READUNCOMMITTED transaction allowing dirty reads). You simply inherit the last used isolation level on the connection, so you have no idea which type of locks are taken (or worse: ignored) by your queries and for how long these locks will be held. On a busy database, this will definitely lead to random errors, time-outs and deadlocks.

Demo

To illustrate the arbitrary isolation level, the sample application contains the following code snippet that is executed after the transaction. It executes the T-SQL version of Thread.Sleep, which gives you some time to monitor it from the outside:

So you should always specify an isolation level yourself. According to SQL Server Books-on-line, setting the isolation level requires a solid understanding of transaction processing theory and the semantics of the transaction itself, the concurrency issues involved, and the consequences for system consistency. I couldn't agree more.

How to detect escalation

When using TransactionScope, there's always the risk of escalation to distributed transactions. So you should keep an eye on these. There are a couple of techniques to detect and monitor distributed transactions. Here are three of them:

SQL Server Profiler

You can also trace distributed transactions with SQL Server Profiler:

Stop MSDTC and watch the exceptions in your program

Letting the application crash by stopping the MSDTC service, is my favorite solution (well, at least in the developer or test environment). The trick only works with newer versions of .NET/SQL Server. In the past, MSDTC should run even for local transactions. That bug was fixed.