Correctly Building Asynchronous Libraries in .NET

Building an asynchronous library requires very specific design patterns that can be quite different from the patterns used when consuming an asynchronous library. But if you follow some basic rules you can greatly improve the experience for the consumers of your libraries.

Synchronous: Control is returned when all of the work is done. The function blocks until then.

Asynchronous: Control is returned immediately.

Library authors, on the other hand, look at it in terms of what resources are being consumed. If it is CPU bound, then it is synchronous. If it is hardly touches the CPU (e.g. because it is I/O bound) then it is asynchronous.

To this effect, Lucian goes on to say that library methods should follow these rules:

Define an Async method if and only if you are not thread-bound.

Define a synchronous method if and only if you have a faster synchronous method that won’t dead lock.

Application developers will look at the signature and assume that you are following these rules. For example, if they see a synchronous method they will assume that they can safely parallelize it using the thread pool. But if it is async, they will assume that spawning extra threads would be wasteful and instead parallelize the work by invoking the function in a tight loop on a single thread.

To this effect there are some basic rules:

“Don’t use Task.Run in Libraries”

Threads, especially thread pool threads, are a globally shared resource that belong to the application developer. The library writer should never use Task.Run or any other method to create threads. It is the right and responsibility of the application writer to decide when, or even if, additional threads are warranted.

Since this contains a synchronous call to IO.DownloadFile, it will block a thread in the thread pool. As mentioned above, the runtime will eventually detect that the thread is blocked and add another one to the pool. But that takes time and only goes so far. Eventually you will hit the max thread pool size, which may be much lower than the number of truly asynchronous calls that you could have supported.

Meanwhile, you may be starving the rest of the application of needed thread pool resources. Being a library author rather than the application developer, you don’t have knowledge of what else may need those threads.

“Don’t use Task.Run on the Server”

Task.Run is never appropriate in server code when the goal is high scalability. In order for an application to scale efficiently shared resources such as threads need to be carefully curated in order to prevent waste. Ideally you are only running one thread per core. If you create more than that then you waste CPU cycles on context switching and memory on the thread’s stack. (In Windows, that generally means 1 MB per thread.)

If the server is tuned for low latency instead of high scalability, then using Task.Run may make sense. But again, this is decision for the application developer to make, not the library author.

Task.Run on the Client

On the client side there are many reasons to use Task.Run, but they all exist at the application level. The library code lacks the context to decide that a given operation needs to be pushed onto a background thread. The application code may be already on a background thread when the library function is called. Or it may be interacting with the UI, in which case it needs to stay on a UI thread. (Note: In WinRT/XAML programming there may be more than one UI thread in an application.)

For these and other reasons Lucian says,

If you are using Task.Run in your library, you are putting up roadblocks that are keeping the user of your library from optimally using the thread pool.

Exception: Multi-threading and WinJS

When writing libraries that JavaScript for Windows 8/WinRT (WinJS) is going to consume, then you may need to use Task.Run despite the previous concerns. The reason is that WinJS is incapable of spawning new background threads and needs the library to do it instead.

The Windows 8 design guidelines state that any CPU-bound function that is expected to take 50 ms or longer needs to expose an async wrapper.

Exception: Stream.ReadAsync

When .NET 4.5 was ported to WinRT, this method was problematic. Stream exposes ReadAsync in the Stream base class, but some types of streams don’t actually support that. In those cases the safest option was to use Task.Run in the base Steam class.

Fortunately FileStream and NetworkStream override this behavior and offer true async. And other types such as MemoryStream it just makes more sense to perform the read synchronously.

Don’t Wrap Asynchronous Methods with Synchronous Methods that use Wait

When application developers see a synchronous version of a method that is also offered asynchronously, they are going to make some assumptions about it. One assumption is that the synchronous version is going to be faster. For it isn’t, then there is no reason to offer both. Instead they should just call Task.Wait themselves on the result of the asynchronous version.

Another assumption that the developer is going to make is that it is safe to run the method on the UI thread. Assuming of course that they understand the context (e.g. number of items being loaded) and are willing to accept the latency. But if they do this, and the “synchronous” method uses Task.Wait, it will deadlock the UI.

What happens here is that the async keyword indicates that the function should be continued in the same context. In the case of the UI thread, that means on the UI thread using the Dispatcher to wait its turn to finish running. But it will never get a turn because the call to Task.Wait is blocking the thread while it waits for the code after the async keyword to finish.

Libraries generally shouldn’t block on async

Be a responsible library developer. Instead of trying for trying for symmetry, only offer synchronous methods if they truly are synchronous. And only asynchronous methods if they truly are asynchronous.

Deadlocks and the SynchronizationContext

“The SynchronizationContext represents a target for work via its post method.” It has existed since .NET 2.0 but was never really used until .NET 4.5 introduced the async/await keywords. In WinForms this maps to Control.BeginInvoke while in XAML frameworks the post goes to the Dispatcher. There is even one for ASP.NET that ensure code isn’t run in parallel. In total there are about 10 implementations of SynchronizationContext in the .NET Framework and developers are welcome to write their own.

Whenever await is used, the ambient SynchronizationContext is captured. Post is called on this SynchronizationContext to resume the work once the async operation has completed. If there is no ambient SynchronizationContext, then the continuation is posted to the TaskScheduler.

For application-level code this is almost always the correct behavior. But for library code it is almost always the wrong thing to do. For library code you instead want to use this pattern:

await FooAsync.ConfigureAwait(false);

This will prevent it from capturing a SynchronizationContext and instead have it just continue running on whatever thread the OS gives it. Doing this offers two advantages:

In one demo Lucian showed that using ConfigureAwait(true), the default, was 14 times slower than ConfigureAwait(false). The per-call cost in absolute time is trivial, but when called in a type loop hundreds of thousands of times it can start to add up.

More importantly, users may call Task.Wait on your asynchronous method while on the UI thread. This will cause a deadlock if your library didn’t use ConfigureAwait(false) and also tries to continue itself on the UI thread.

But why would they do this?

Async is really a virus.

If you use it at the bottom of your call stack, its caller is going to have to change its name and use async. And its caller’s caller is going to have to use async. And so forth, ideally all the way to the top of the stack. But in practice you may be under a framework or architecture that you can’t change and thus can’t use async. When that happens the application developer is forced to use Task.Wait or another horrible blocking call so that it can be consumed synchronously.

Which leads us to Lucian’s next principle,

The user’s thread doesn’t belong to you, it belongs to the user. So don’t pollute it with your library code.

So as a rule of thumb, libraries should always use ConfigureAwait(false) when awaiting a task.

Performance and the ExecutionContext

This is another ambient context. It contains information such as the currently logged on user if you are doing impersonation or the current culture. It can be thought of as an alternative to thread local storage that continues to work while moving from thread to thread.

Async is optimized for the case where the ExecutionContext is left in its default state and not disturbed. If the application or library developer uses CallContext.SetLocalData to store some data (e.g. for an async version of ambient transaction state) then they will add a small performance cost to every async call. In a trivial example it can add 60 to 100% to the cost. Again, this will probably not matter unless the async method is used in a tight loop with hundreds of thousands or millions of iterations.

More on the Performance Model

As mentioned in yesterday’s report titled Async/Await – Performance Overheads and Other Pitfalls, async methods have an inherit cost due that you have to pay for the Task creation and exception management. For a trivial method this cost can be 10x a trivial synchronous method.

Since this is only a problem when calling async methods in a tight loop with millions of iterations, encourage the user of your library to not do that. Give them “chunky” APIs where they can call your async method infrequently and do more work per call.

Trivia: Using the await keyword can actually be slightly faster than duplicating the functionality by manually using callbacks. This is because the people who developed the compiler code for async/await have a deep understanding of the JIT compiler and access to special functionality on the Task type that are not publically exposed.

Memory is a Global Resource

Unnecessary memory allocations can have a significant impact on an application’s performance. And since the cost for the allocations are deferred until the garbage collector runs, the cost isn’t easily associated with the code that generated it.

A typical async method call involves three memory allocations:

A state machine for storing the locals variables as fields

A delegate for the continuation

A task for returning the result

The state machine and delegate are only created if the await keyword is actually encountered at run time. If your common code path is designed to avoid the awaits, then you avoid two of the three allocations.

An example of this would be a GetInt method that read from some stream. If inside of the method your internal await call read a thousand bytes at a time and dropped them in a buffer then you would only hit said await call once per 250 calls to your function.

That’s a huge reduction in memory consumption, but still leads the Task object. But that too can be optimized in some cases. As it happens, a completed Task is immutable. This means the runtime can cache Task objects containing common return values such as 1, 0, true, false, null, and the empty string.

Obviously the runtime cannot cache all possible return values, but your library code may be amenable to caching the one that it most frequently uses. Consider doing it in cases where your return value is an enumeration or other value from a finite range. Or when reading cacheable data from a web service.

About the Author

Jonathan Allen has been writing news report for InfoQ since 2006 and is currently the lead editor for the .NET queue. If you are interested in writing news or educational articles for InfoQ please contact him at jonathan@infoq.com.

About “Don’t use Task.Run in Libraries”, what's the alternative of it, when such use is inevitable?

For example, I wrote an application framework/library which use other libraries which don't support async APIs yet. But I need to wrap them in my own APIs in an async manner, as they're actually I/O bound and I don't want consuming them causing client UI to freeze/not responding carelessly.