Reimplementing LINQ to Objects: Part 5 – Empty

Continuing with the non-extension methods, it’s time for possibly the simplest LINQ operator around: "Empty".

What is it?

"Empty" is a generic, static method with just a single signature and no parameters:

publicstatic IEnumerable<TResult> Empty<TResult>()

It returns an empty sequence of the appropriate type. That’s all it does.

There’s only one bit of interesting behaviour: Empty is documented to cache an empty sequence. In other words, it returns a reference to the same empty sequence every time you call it (for the same type argument, of course).

What are we going to test?

There are really only two things we can test here:

The returned sequence is empty

The returned sequence is cached on a per type argument basis

I’m using the same approach as for Range to call the static method, but this time with an alias of EmptyClass. Here are the tests:

… but we want to obey the (somewhat vaguely) documented caching aspect too. It’s not really hard, in the end. There’s a very handy fact that we can use: empty arrays are immutable. Arrays always have a fixed size, but normally there’s no way of making an array read-only… you can always change the value of any element. But an empty array doesn’t have any elements, so there’s nothing to change. So, we can reuse the same array over and over again, returning it directly to the caller… but only if we have an empty array of the right type.

At this point you may be expecting a Dictionary<Type, Array> or something similar… but there’s another useful trick we can take advantage of. If you need a per-type cache and the type is being specific as a type argument, you can use static variables in a generic class, because each constructed type will have a distinct set of static variables.

Unfortunately, Empty is a generic method rather than a non-generic method in a generic type… so we’ve got to create a separate generic type to act as our cache for the empty array. That’s easy to do though, and the CLR takes care of initializing the type in a thread-safe way, too. So our final implementation looks like this:

That obeys all the caching we need, and is really simple in terms of lines of code… but it does mean you need to understand how generics work in .NET reasonably well. In some ways this is the opposite of the situation in the previous post – this is a sneaky implementation instead of the slower but arguably simpler dictionary-based one. In this case, I’m happy with the trade-off, because once you do understand how generic types and static variables work, this is simple code. It’s a case where simplicity is in the eye of the beholder.

Conclusion

So, that’s Empty. The next operator – Repeat – is likely to be even simpler, although it’ll have to be another split implementation…

Addendum

Due to the minor revolt over returning an array (which I still think is fine), here’s an alternative implementation:

The array method is not so great: People will incorrectly depend on the return value being an array although this is not documented. That is why linq queries of the form “from x in y select x” will still do a .Select(x=>x) in order to hide the original sequence. Handing out the original sequence would make it easy to take a false dependency.

The point is that people will start depending on undocumented properties and when your implementation changes they will break. This is not only their problem but yours too, because they will think that it is your mistake.
If your point was true then why is microsoft doing all that compatibility work? They do that because people will think microsoft is to blame when a buggy app breaks. The same happens when users install a new version of your lib and suddenly their app stops working.

@tobi: So whenever a method is declared to return IEnumerable[T], do you believe it should *never* return an instance of a concrete class? How about IList[T]? Yes, people shouldn’t take dependencies on the implementation – but that doesn’t mean it’s wrong to return *some* public class which implements an interface.

Out of interest, have you logged a Connect request suggesting that MS shouldn’t return an array from Empty in their implementation?

Who can say “never” and really mean it? In my own projects I do that all the time because I am willing to risk it (it is after all a very small risk). However if you are microsoft and have 3M devs use your product, you do not want to risk it. I believe that this is what their internal guidelines actually say because you find this pattern all over the BCL (there is even a TrueReadOnlyCollection which does the same thing but is inaccessible). I also believe the current Empty() impl. to be a mistake but propably the pattern was too elegant _not_ to implement it ;-)
This is also the reason why many old BCL methods take only Arrays and not lists or enumerables because taking interfaces means that the BCL method will execute your code in the middle of it (reentrancy problems). There is a whole science to avoiding compatibility problems (read raymond chens blog to see on what absurd properties people depend in real-life).
No, I did not log a Connect request.