Preview of Immutable Collections Released on NuGet

Over the last years .NET added many features to make writing multithreaded applications easier. This includes the Task Parallel Library (TPL) as well as the new async/await keywords features to reduce the friction when writing asynchronous code. However, it’s still challenging to keep mutable state under control when multiple threads are involved. A common approach is to make use of immutable state that can be passed freely between different threads. Unfortunately, implementing immutable data structures can be quite hard, especially when collections are involved. In this post, Andrew Arnott, a developer from the Visual Studio team, will talk about a set of collection types we developed to make that easier. –Immo

Background and reasons for immutable collections

The .NET Framework already includes many highly-tuned generic collections. And in several cases read-only interfaces or wrapper classes exist as well. So how are immutable collections different and why do we need them? In short, because read-only collection types are merely facades over collections that can change.

If someone hands you a ReadOnlyCollection<T>, an IReadOnlyList<T>, or an IEnumerable<T> the only guarantee is that you can’t change the data – there is no guarantee that the person that handed you the collection won’t change it. Yet, you often need some confidence that it won’t change. These types don’t offer events to notify you when their contents change, and if they do change, might that occur on a different thread, possibly while you’re enumerating its contents? Such behavior would lead to data corruption and/or random exceptions in your application.

In contrast, if someone provides you with data via an immutable type, the guarantee is that the data will never change. You can hold a reference to it as long as you want, enumerate it, search it, etc., from any thread at any time, and be completely confident that you’ll get the expected behavior.

There are several kinds or levels of immutability, in fact. These immutable collections never allow mutation, yet offer mutating methods that return new collections, while sharing data across versions very efficiently. You can read up more on various types of immutability from Eric Lippert blog post Immutability in C# Part One: Kinds of Immutability.

Why should I use immutability?

There are many good reasons to use immutable collections. Here are a few:

Snapshot semantics, allowing you to share your collections in a way that the receiver can count on never changing.

Any time you have a class member that accepts or returns a collection type and you want to include read-only semantics in the contract.

Functional programming friendly.

Allow modification of a collection during enumeration, while ensuring original collection does not change.

They implement the same IReadOnly* interfaces that your code already deals with so migration is easy.

Introducing the immutable collections

The NuGet package preview includes these types:

ImmutableStack<T>

ImmutableQueue<T>

ImmutableList<T>

ImmutableHashSet<T>

ImmutableSortedSet<T>

ImmutableDictionary<K, V>

ImmutableSortedDictionary<K, V>

Interfaces for each of these types are also defined to facilitate exchange of immutable collection types that may be implemented differently to optimize for very specific performance or memory requirements.

I’ll look at how you might use ImmutableList<T>. First, I’ll start by acquiring an empty immutable list of fruit:

var emptyFruitBasket = ImmutableList<string>.Empty;

Notice that rather than a public constructor we provide a static Empty property. Using a static Empty property is a departure from a well-entrenched pattern, and there is a reason for it. A constructor must allocate an object, yet given the immutability of these collections, a new object could only ever represent an empty list. Since mutating the collection creates a new collection, using a constructor would cause the creation of multiple objects to represent an empty list, which would be wasteful. A static property allows us to return an empty list singleton which all code in your app can share.

An empty list isn’t that interesting. Next I will add some elements to it. As just stated, this is an immutable list, so when I request a change, a new list is allocated and the results are assigned to some variable:

var fruitBasketWithApple = emptyFruitBasket.Add("Apple");

The Add method returned a new list with one element, and I assigned this list to a new variable. The original emptyFruitBasket list is still empty. This is similar to System.String, which has an Empty static property, and mutating methods that always return new strings that include the requested change.

You may not want to come up with unique names for new local variables on each assignment, so you can simply reassign to the same local variable:

We do not simply copy the contents of the immutable collections with each change. This would result in poor performance with collections of any interesting size, occupy too much memory and stress the GC. Instead, we built these immutable collections to share as much memory between instances as possible. Following is a contrived extreme case of a list with 10,000 elements:

How much memory have we used between both hugeList and hugeListPlus1? Only slightly more than for hugeList alone. We did this by storing the immutable collections in immutable binary tree structures instead of flat arrays. When we create a list based on an existing list, we start with the existing binary tree and change just a few nodes in it, allocating new objects for each changed node, and return a list that points to the new root node of the binary tree. So hugeList and hugeListPlus1 actually share almost all their memory. If the app releases its reference to either list, the data unique to that particular list is immediately a candidate for garbage collection, so we don’t create memory leaks.

Introducing the Builders

Sometimes you need to “mutate” an immutable collection many times as you’re initializing it. Each top-level mutation call into an immutable collection results in allocating a few new nodes in that binary tree. All those extra allocations can add up in terms of GC pressure and performance. The same problem exists when you mutate strings repeatedly, which you can avoid by using the StringBuilder in those cases. Immutable collections use the same approach as strings. All the immutable collections that create GC pressure from many mutations provide ToBuilder() methods. The builder classes implement the same mutable interfaces that you’re used to, allowing you to pass these builders around as needed to initialize or update your collections. Then you can restore immutability by calling ToImmutable(). Here’s an example:

/// <summary>Maintains the set of customers.</summary>
private ImmutableHashSet<Customer> customers;
public ImmutableHashSet<Customer> GetCustomersInDebt()
{
// Since most of our customers are debtors, start with
// the customers collection itself.
var debtorsBuilder = this.customers.ToBuilder();
// Now remove those customers that actually have positive balances.
foreach (var customer in this.customers)
{
if (customer.Balance >= 0.0)
{
// We don’t have to reassign the result because
// the Builder modifies the collection in-place.
debtorsBuilder.Remove(customer);
}
}
return debtorsBuilder.ToImmutable();
}

These builders are mutable collections that use exactly the same data structure as the immutable collections, but without the immutability requirement, allowing them to be very efficient about their memory use as you add or remove elements. In the above sample, the ToBuilder() did not copy the customers collection, but rather keeps using the existing immutable collection you already had. When the builder collection is modified, it allocates a new reference to the collection that points to a slightly modified collection, which preserves the original this.customers collection unmodified. Finally, when ToImmutable() is called, the builder “freezes” its modified internal structure and returns the new reference to the immutable collection. Important note: No immutable collections are modified in the execution of the above example, nor was the entire collection copied at any time.

You don’t always need to use builders though. If you can mutate your collections with just one method call then a builder isn’t necessary. For example, the above sample could be rewritten without builders using a lambda:

The above example is just as efficient as the Builder approach. However, not all bulk modifications to immutable collections are expressible as a single method call with a lambda, and you can use the Builders in those cases.

Performance characteristics

How do immutable collections perform relative to their well-known mutable BCL counterparts? These immutable collections have been heavily tuned for maximum performance and minimum GC pressure. They’re very fast, and in most cases they will be appropriate to use in your applications.

When they are typically faster

Immutable collections’ performance shines best when your application already requires snapshot semantics, whether for thread-safety or just for the simplicity it offers your codebase. For example, you might have code like this in your application, in order to provide some “snapshot” semantics or provide thread-safety:

Every time this property is queried, it makes an entire copy of your internal collection. Replacing this mutable collection with an immutable collection can dramatically improve this scenario because returning a reference of the current immutable collection implicitly retains the snapshot semantic and no copying (or locking) is required. It becomes:

Switching to immutable collections in some components of Visual Studio has helped us dramatically improve the performance of some features.

When they might be slower

There are some noteworthy differences in algorithmic complexity between mutable and immutable collection types:

Mutable (amortized)

Mutable (worst case)

Immutable

Stack.Push

O(1)

O(n)

O(1)

Queue.Enqueue

O(1)

O(n)

O(1)

List.Add

O(1)

O(n)

O(log n)

HashSet.Add

O(1)

O(n)

O(log n)

SortedSet.Add

O(log n)

O(n)

O(log n)

Dictionary.Add

O(1)

O(n)

O(log n)

SortedDictionary.Add

O(log n)

O(n log n)

O(log n)

Why are the immutable types more prone to O(log n) time where mutable types are O(1)? It’s a consequence of the internal data structures that allow sharing across instances of the collections. For instance a mutable HashSet<T> allocates a single array, and stores elements in that array at the index determined by their hash code. This providers O(1) lookup and storage times. If an ImmutableHashSet<T> used this same kind of data storage, then every mutation would require reallocating and copying the entire array just to change one entry, which would quickly tank performance and GC pressure as the collections grew large. Instead, these immutable collections use immutable balanced binary trees to store their data, allowing very efficient data sharing across mutated instances of the collections. As a result however, every lookup or change requires a tree traversal that has (at worst) an algorithmic complexity of O(log n).

Another point to consider when considering algorithm complexity however is worst case. Mutable collections can outgrow their initially allocated capacity and have to reallocate and copy their contents into a larger array. These immutable collections grow organically so they don’t experience the O(n) worst case on growth that mutable collections do. The hash-based immutable collections can still degrade to O(n) time on hash collisions just like mutable collections however.

Memory characteristics

The immutable collections spend a bit more memory per element for storage than their mutable counterparts, but without the slack space that mutable collections tend to have in their arrays. This means you win some, but you lose some too. Given a single instance of a collection, your actual mileage may vary as far as how its memory consumption compares between the two types.

When removing elements from your collection, the mutable collections generally don’t shrink their arrays so you don’t reclaim that memory. Immutable collections immediately shrink their binary trees for each removed element so the memory is a candidate for garbage collection.

How can I get started?

Call for feedback

The NuGet package will be in a prerelease form for a brief time, giving you an opportunity to tell us what you think so we can make necessary changes. Add comments to this post, or contact the authors of our NuGet package.

Frequently asked questions

Q: Can I only store immutable data in these immutable collections? A: You can store all types of data in these collections. The only immutable aspect is the collections themselves, not the items they contain.

Q: Do Builders let you modify an immutable collection? A: No. Immutable collections can never be modified. Builders give you the ability to start and finish a series of mutations with immutable instances, but in between use mutability for greater efficiency. Both the collections you start and finish with are in fact immutable.

Q: Is it safe to use immutable collections in a multi-threaded environment? A: Yes. All truly immutable object graphs in .NET are safe in a multi-threaded environment.

Q: Mutable collections are fine for me, aren’t they? I’m not into F# or a math buff. How can immutable collections help me? A: Immutable collections don’t entirely supersede the mutable collections found in the .NET Framework. You can and often should continue to use them as you have before. To learn more about immutability and what its benefits are, you may be interested in reading Read only, frozen, and immutable collections.

Q: I’ve heard that immutable collections are slow. Are these any different? Can I use them when performance or memory is important? A: These immutable collections have been highly tuned to have competitive performance characteristics to the mutable collections while balancing memory sharing. In some cases they are very nearly as fast as the mutable collections both algorithmically and in actual time, sometimes even faster, while in other cases they are algorithmically more complex. In many cases however the difference will be negligible. Generally you should use the simplest code to get the job done and then tune for performance as necessary. The immutable collections help you write simple code, especially when thread-safety has to be considered.

Q: I want to try these immutable collections out but I need a convenient escape hatch in case their performance doesn’t meet my requirements. What can I do? A: Use the new .NET 4.5 readonly interfaces on all your public APIs, such as IReadOnlyCollection<T> and IReadOnlyList<T>. In your implementation you may use immutable collections, which implement these interfaces. If in the end you need to switch to mutable collections, you’ll be able to do so without any breaking changes.

Q: I’m totally jazzed about these immutable collections. Where can I find out more? A: There are lots of fun details about optimizations these collections have, best practices and anti-practices, etc. that don’t fit in this post. But these details may appear in future posts if you follow the immutability tag on my blog here on MSDN.

@xoofx We plan to release the final, supported version of these collections as a NuGet package so that it continues to be available on .NET 4.5 and later (and other SKUs). There aren't any plans yet to work with MSR but that's definitely an interesting possibility going forward.

While SL5 (or .NET 4 or WP) doesn't support IReadOnlyList<T> etc is should be a pice of cake for you to supply a nuget package where you just define the interface in the assembly (there is no requirement to actually implement all the other readonly collections).

You could also just use #if/#endif to hide that you implement the interface (given you use implicit implementation of the functions), but this would probably have a much larger impact on the code.

Update to my earlier post, the Microsoft.Bcl NuGet package (or any realated) looks like a good place to add the interface

Philipp K

20 Dec 2012 2:12 AM

Very nice article! Minor comment: I would have liked to see a more complete table on the algorithmic complexity, for example for dequeue, pop, etc.

Joe White

20 Dec 2012 5:06 AM

(apologies if this is double-posted; the server timed out the first time I tried to submit)

Questions that came to mind while I was reading this post:

1. Is the Add method annotated with the [Pure] attribute? You mentioned the risk of calling it and forgetting to do anything with the return value; if it's got the [Pure] attribute, then tools like ReSharper can give you a warning if you don't use the return value. And logically, it is a pure method; it doesn't modify any state on the list.

2. I understand why you don't want a parameterless constructor, but shouldn't there should be a constructor / factory method that takes an IEnumerable<T>? Sure, you could use .Empty.AddRange, but that's kind of obtuse.

3. Is part of the reason for "no constructors" because of concerns about collection-initializer syntax? These implement IEnumerable<T> and have a public Add method; so if there was a public constructor, the compiler would let you do new ImmutableList{a,b,c} which would translate to three calls to Add, but would throw away the results. This seems like a pretty compelling reason not to have any public constructors on these classes, though I'm surprised you didn't call it out in the post.

4. You call out the performance of add methods, but what about reads? What's the perf of the indexer (list[i])? (I'd expect O(log n) if it's a tree, but that's not good for a lot of applications so I'd expect you to have called it out if that was the case.) What about perf of iteration (GetEnumerator)?

5. Are there / will there be convenience extension methods akin to .ToList()?

You're absolutely right that we could use NuGet's framework targeting and #ifdef's to support downlevel platforms. Current thinking is that since the collections are expected to be used as exchange types so having one portable library that works everywhere (or as close to it as possible) seems much more valuable than multiple builds of the DLL that will prevent interop as people mix and match libraries that were compiled against different versions of the framework.

#2 Thanks for the feedback and please keep it coming. We'll consider it. One concern I have is that an initializing constructor without a default constructor might be more confusing, and even lead folks to pass in empty arrays just to initialize a collection if they're not led to discover the static Empty property.

#3 Avoiding collection initializer confusion wasn't one of the reasons we considered (as far as I recall anyway) but it does sound like a valid one.

#4 Good point. Yes, the ImmutableList<T> indexer has O(log n) performance for a single lookup. The enumerator is O(1) for each step. So if you're walking the whole list, using the enumerator is far more efficient. Leaving that out of the post was an oversight.

#5 Yes, a handful of extension methods are included.

Stephen Cleary

20 Dec 2012 10:17 AM

Glad to see this! I've already dl'ed and poked around a bit (and subscribed to your blog, Andrew).

One thing I'd like to see is well-documented amortized/worst performance for each operation or at least an overview of the internal data structures for each collection, so that it's clearer which collection to choose for a particular problem. Names like "List" invoke a mental image of the data structure and performance tradeoffs that is inaccurate.

My other suggestion echos what others have said: maximize the supported platforms. I think MS.Bcl should support IReadOnly* (not hard to do with type forwarding on newer platforms) and then ImmutableCollections would build on that.

@Everyone Who has asked for broader platform support: please visit visualstudio.uservoice.com/.../31481-net and cast votes for your favorite feature requests. At the moment, there is only one immutable collection request there. Please check what's there when you visit and vote or create a new item to vote for. :)