Server Scalability

Whenever one of these threads performs a blocking operation (i.e. an operation that accesses the network or some other form of I/O) the thread will remain idle waiting for the operation to complete. If during the interval new requests arrive, each of them
is assigned its own thread.

Depending mainly on the rate at which new client requests arrive, and the time it takes for each request to be served, the number of threads executing (or blocked) on the server might grow rapidly.

Threads happen to have a significant memory footprint (about 1 MB of virtual memory space); therefore too many simultaneous threads on a server can easily max out the available memory long before processor utilization comes near a 100%, becoming the main
bottleneck for the server’s throughput.

In cases like these it is possible to improve scalability without changing hardware by using non-blocking calls when communicating to external resources: threads don’t need to waste time waiting for those calls to complete, and instead can be returned
to the thread pool so that they can be reused to service other incoming requests.

Non-blocking I/O calls help in keeping the thread count low, removing the memory bottleneck and making it possible for the application to scale better on the same hardware.

Client Responsiveness

On the other hand, even with the progress and broad availability of high-speed network connectivity, latency is still one of the dominant factors in the usability of distributed applications. Data intensive applications in the cloud can rapidly degrade if
their user interfaces spend most of the time blocked waiting for responses from the server.

The new Task-based Asynchronous Pattern (TAP) provides a simple way for users to make asynchronous the long-running methods that currently make the UI unresponsive.

Non-Goals

The following are things we are explicitly not trying to enable with the feature in EF6:

Thread safety

Async lazy loading

Thread Safety

While thread safety would make async more useful it is an orthogonal feature. It is unclear that we could ever implement support for it in the most general case, given that EF interacts with a graph composed of user code to maintain state and there aren't
easy ways to ensure that this code is also thread safe.

For the moment, EF will detect if the developer attempts to execute two async operations at one time and throw.

Async Lazy Loading

Lazy loading was one of the most requested features we added in .NET 4. It is a quite powerful feature because it allows “virtualizing” the navigation over a graph of objects that is actually stored in the database, providing the illusion that
is completely loaded into memory. This allows for better separation of concerns and simpler code, but it has a number of disadvantages.

One of the main critiques to lazy loading is the fact that the cost of reading a property becomes indeterministic. It seems that there is no place for this kind of indeterminism in the scenarios in which we expect Task-based async to be critical.

In other words, there is an argument that leads to the conclusion that someone that is optimizing for server throughput should not use lazy loading and instead should use eager or explicit loading.

However, there is a hybrid approach we could consider supporting in the future:

In this case, EF would need to recognize that the pattern of a property that returns a Task<T>, where T is an entity type is actually an “async navigation property”, create and do the adequate Object-Conceptual mapping for the actual property.
Since the property is virtual, EF can generate a dynamic proxy that implements the lazy loading.

We should keep in mind that using the Task-based async patterns with properties dilutes some of the transparency that lazy loading provides. That might be ok, since thanks to TAP support in the language the code doesn’t look too different from the
sync version, it is just the navigation becomes explicitly asynchronous.

Another important challenge would be how to refer to an async navigation property in a LINQ expression, given that the .NET languages currently don't support construction of lambda expressions containing await.

Dependencies

We are able to provide async support in EF by basing our implementation on the new async API in ADO.NET provider model and the async and await keywords introduced in Visual Studio 2012/.NET 4.5.

However if a specific provider doesn’t implement the asynchronous methods they will fall back to synchronous execution without any warning.

Async support in EF requires .NET 4.5 and will not be available on .NET 4.

Design

We are aiming to introduce async versions of the operations that perform network I/O and could become the bottleneck on either the client or the server. This includes all operations that cause results to be materialized from the database (i.e. for each,
ToList, Single, SqlQuery, etc.) and operations that cause commands to be sent to the database (i.e. SaveChanges, ExecuteSqlCommand, etc.).

We are following the generally accepted standard of introducing a second asyncronous version of each method, using the Async post fix (i.e. SaveChanges and SaveChangesAsync).

Our main intent is to provide async versions of methods on the DbContext API. Where these methods require an asynchronous counterpart on the ObjectContext API we will also add that method (i.e. SaveChanges). In some cases we may also implement
additional asyn methods on the ObjectContext API for the sake of completeness.

We are not attempting to provide asynchronous database/schema creation.

For methods that LINQ to Entities does not support (i.e. Last), we are not providing an Async method.

API Examples

Query

A typical example of code that causes a query to be sent to the database is iterating over the results of a LINQ query:

Loading

Raw SQL Queries

Cancellation Tokens

All the async methods have overloads that accept a CancellationToken. Since it is expected that the bulk of the execution time of these methods will be waiting for the database operation the cancellation is delegated to the ADO.NET provider.

The exception to this is ForEachAsync as in this method we can potentially have many opportunities to cancel the operation between the calls to the database.

Limitations

While async has some real advantages it is not for everyone. Asynchronous invocations can introduce overhead and can degrade performance if not used correctly. As with any performance-related changes establish goals and perform measurements before making
any modifications to your applications.

Implementation Challenges

Code Duplication

All of the new methods have equivalent behavior and implementation to the existing synchronous ones. However they return Task or Task<T> which makes it difficult to unify the implementations without decreasing the performance of the synchronous methods,
since Task creation usually means allocation of a new object.

We still can extract parts of the implementation as long as they don’t contain async method calls. Also both implementations are placed consecutively in the source code, so it’s easy to change both when needed.

Performance Considerations

Some things that we have done to provide better performance are:

We are calling
ConfigureAwait(continueOnCapturedContext: false) on the tasks before awaiting them. A large portion of the async overhead is marshaling the continuation to the original context. In a library not only is this not necessary, but could actually result in a
deadlock when called from code where context is important, like a UI thread.

When there’s only a single place in the method where a Task is created and it’s a tail call (i.e. the last statement in the method) then it is replaced by a delegate to avoid unnecessary awaits (but this is rarely the case).

In the future we could consider the following options for further improving performance:

Further minimizing the number of async method calls on the stack. The compiler can’t inline methods in an async method so each invocation in an async method carries overhead. Also when an async method yields all the local variables are saved to the
heap. This also means that async methods should have less local variables.

As a last resort of improving performance we can drop the async keyword and implement TAP manually with TaskCompletionSource<TResult>.

Thanks for the article. Answered a question for me on the importance of having virtual on your navigation properties. I had a couple of navigation properties that did not have the virtual and I was getting "sequence contains no elements" errors but it would work just fine when I stepped into while debugging. After adding the virtual, it worked. The virtual enables Entity Framework to implement overrides that probably look something like (navigationPropertyLazyLoading {get { return navigationPropertyLazyLoadingTask.Result;}} )

In addition to ForEachAsync I think it would be awesome if there was a method that could return an IObservable<T> to the results of a query. I know that IObservable<T> doesn't fit completely within the async/await model at the moment (which is such a huge lost opportunity, async foreach anybody?) but with the addition of Rx and LINQ queries over them they are a great tool for working with potentially asynchronous operations with multiple potential results.