Generic Covariance and Contravariance in C# 4.0

Covariance and contravariance are precise terms that describe which conversions are safe on parameters and return types. Learn practical definitions for those terms, what new constructs will be supported in C# 4.0 and how to live with the current limitations until Visual Studio 2010 is adopted by your organization.

In this month's installment of C# Corner, we look at one of the new features in C# 4.0, the next version of C# that will be delivered in Visual Studio (VS) 2010. Looking ahead to specific features in C# 4.0 can help you plan how to take advantage of the new capabilities in your coding efforts. It can also help you understand problems that are occurring in your programs today that the next version of C# may be able to resolve. Ultimately, the exercise will help you build a business case for adopting C# 4.0 based on existing knowledge of how it can help you.

This month I'll examine covariance, contravariance and how the C# language will be addressing those features with version 4.0 in VS 2010.

Invariance, Covariance and Contravariance Explained
Before we go further, let's examine what invariant, covariant, and contravariant parameters and return types mean. You almost certainly are familiar with the terms, even if you don't have a grasp of the formal definitions.

A return value or parameter is invariant if you must use the exact match of the formal type name. A parameter is covariant if you can use a more derived type as a substitute for the formal parameter type. A return value is contravariant if you can assign the return type to a variable of a less derived type than the formal parameter.

In most cases, C# supports covariant parameters and contravariant return types. That's consistent with almost every other object-oriented language. In fact, polymorphism is usually built around the concepts of covariance and contravariance. You intuitively know that you can pass a derived class object to any method expecting a base class object. You intuitively know that you can pass a derived object to any method expecting a base object. After all, the derived object is also an instance of the base object. You instinctively know that you can store the result of a method in a variable of a less-derived object type than the formal method return type.

This works because parameter types are covariant in C#. Similarly, you can store the result of any method in a variable of type object, because return types in C# are contravariant:

object value = SomeMethod();

If you've done any work with C# or Visual Basic.NET since .NET was first released, all this is familiar ground. However, the rules change as you begin to look at any type that represents a collection of objects. In too many ways, what you intuitively think should work just doesn't. As you dive deeper, you may find that what you believe is a bug is actually the language specification. Now it's time to explain why collections work differently, and what's changing in the future.

Object-Based Collections
The .NET 1.x collections (ArrayList, HashTable, Queue and so on) could be treated as covariant. Unfortunately, they're not safely covariant. In fact, they're invariant. However, because they store references to System.Object, they appear to be covariant and contravariant. A few examples quickly illustrate the issue.

You could believe these collections act as covariant because you can create an ArrayList of Employee objects and use that as a parameter to any method that uses objects of type ArrayList. Often, that approach works just fine. This method would work with any arraylist:

This method is safe because it doesn't change the type of any object in the collection. It enumerates the collection, and it moves items already in the collection to different indices in the collection. However, none of the types change, so this method will work in all instances.

However, ArrayList, and other classic .NET 1.x collections cannot be considered safely covariant. Look at this method:

It's making deeper assumptions about the objects stored in the collection. After the method exits, the collection contains objects of type string. That may not have been the type of the original collection. In fact, if the original collection contained strings, the method does nothing. Otherwise, it transforms the collection into a different type.

The following usage example shows the kinds of problems encountered when you call this method. Here, a list of numbers is sent to UnsafeUse, where it's transformed into an ArrayList of strings. After that call, the calling code tries again to create the sum of the items, which now causes an InvalidCastException.

This example shows that while classic collections are invariant, you could, for all practical purposes, treat them as though they were covariant (or contravariant). But these collections are not safely covariant. The compiler does nothing to keep you from making mistakes in how you treat the objects in a classic collection.

Arrays
When used as a parameter, arrays are sometimes invariant, and sometimes covariant. Once again, just like the classic collections, arrays are not safely covariant.

First and foremost, only arrays containing reference types can be treated as either covariant or contravariant. Arrays of value types are always invariant. That's true even when trying to call a method that expects an object array. This method can be called with any array of reference types, but you cannot pass it an array of integers or any other value type:

As long as you constrain yourself to reference types, arrays are covariant and contravariant. However, they're not safely covariant or safely contravariant. The more often you treat arrays as covariant or contravariant, the more you'll find that you need to handle ArrayTypeMismatchException. Let's examine some of the ways.

Array parameters are covariant, but not safely covariant. Examine this dangerous method:

The following calling sequence will cause the loop to throw an ArrayTypeMismatch exception:

D[] array = new D[]{
new D(),
new D(),
new D(),
new D(),
new D(),
new D(),
new D(),
new D(),
new D(),
new D()};
DestroyCollection(array);

The reason is obvious when you see the two blocks together. The call site created an array of D objects, then calls a method that expects an array of B objects. Because arrays are covariant, you can pass the D[] to the method expecting B[]. But, inside DestroyCollection(), the array can be modified. In this case, it creates new objects for the collection, objects of type D2. That's fine in the context of that method: D2 objects can be stored in a B[] because D2 is derived from B. But, the combination often causes errors.

The same thing happens when you introduce some method that returns the array storage and treat that as a contravariant value. This code looks like it would work fine:

B[] storage = GenerateCollection();
storage[0] = new B();

However, if the body of GenerateCollection looks like this, it will cause an ArrayTypeMismatch exception when the storage[0] element is set to a B object.

Generic Collections
Arrays suffer from being treated as covariant and contravariant, even when that's not safe. The .NET 1.x collection types are invariant, but stored references to System.Object, which wasn't type safe in any practical sense. The generic collections in .NET 2.x and beyond suffer from being invariant. That means you cannot ever substitute a collection containing a more derived object type where a collection containing a less derived type is expected. That's a lengthy way to say that a lot of substitutions you expect to work don't. You'd think that you could write a method like this:

You'd think you could call it with any collection that implements IEnumerable<T> because any T must derive from object. That may be your expectation, but because generics are invariant, the following will not compile:

You can't treat generic collection types as contravariant, either. This line will not compile because you cannot convert IEnumerable<int> into IEnumerable<object> when assigning the return value:

IEnumerable<object> moreItems =
Enumerable.Range(1, 50);

You might think IEnumerable<int> derives from IEnumerable<object>, but it doesn't. IEnumerable<int> is a Closed Generic Type based on the Generic Type Definition for IEnumerable<T>. IEnumerable<object> is another Closed Generic Type based on the Generic Type Definition for IEnumerable<T>. One does not derive from the other. There's no inheritance relationship, and you cannot treat them as covariant. Even though there's an inheritance relationship between the two type parameters (int, and object), there's no corresponding inheritance relationship between any generic types using those type parameters. It feels like it should work, and strictly speaking, it does work correctly.

C# treating generics invariantly has some very powerful advantages. Most significantly, you can't make the mistakes I demonstrated earlier with arrays and the 1.x style collections. Once you get generic code to compile, you've got a much better chance of making it work correctly. That's consistent with C#'s heritage as a strongly typed language that leverages the compiler to remove possible bugs in your code.

However, that heavy reliance on strong typing feels restrictive. Both of the constructs I just showed for generic conversions feel like they should just work. And yet, you don't want to revert to the same behavior used for the .NET 1.x collections and arrays. What we really want is to treat generic types as covariant or contravariant only when it works, not when it simply substitutes a runtime error for a compile time error.

C# 4.0 syntax
C# 4.0 will introduce a set of changes to the language that removes the surprising behavior that closed generic types can't be treated as covariant or contravariant without introducing yet another breach in the type safety system. C# 4.0 introduces syntax to treat some generic types as safely covariant and safely contravariant.

The C# 4.0 language specification uses the terms "output safe" and "input safe" to describe type parameters that are safely covariant or contravariant, respectively. Those terms are somewhat more descriptive, if less precise, to describe how covariance and contravariance work. A couple examples will make it clearer.

The familiar IEnumerable<T> and IEnumerator<T> interfaces are output safe. Therefore, both interfaces can be treated as covariant. Furthermore, those methods are safely covariant. In C# 4.0, both of those interfaces have been annotated with the new "out" contextual keyword to indicate that they can be treated safely covariant:

Notice the addition of the "out" contextual keyword. That signifies that T is covariant. It will compile cleanly because T appears only in output positions, and is output safe. That means, beginning with C# 4.0, you can use an IEnumerable<string> where the formal parameter list expects an IEnumerable<object>. The earlier example:

WriteObjects(IEnumerable<object> items)

can be called using a List<string>, or other reference type.

Similarly, some interfaces contain type parameters that are input safe. For example, IEquatable<T> has been updated noting that it's input safe:

public interface IEquatable<in T>
{
bool Equals(T other);
}

Throughout IEquatable, T appears only in input positions. Furthermore, those input positions are never annotated with the ref or out modifier. T is therefore input safe and can be treated as contravariant. Because T only appears in input locations, a less derived type can be used in the formal parameter list where a more derived type is used as the actual parameter. An IEquatable<object> can be used where you expect an IEquatable<string>.

Other interfaces are still invariant. ICollection<T> contains methods where T appears in input positions as well as methods where T appears in output positions:

The Add() method is not output safe. The GetEnumerator() method is not input safe. Because the entire interface is neither input safe nor output safe, the ICollection<T> interface cannot be treated as either covariant or contravariant. Therefore, ICollection<T> will remain invariant.

There are quite a few limitations on covariance and contravariance in the C# language. Those limitations are meant to minimize potential runtime errors related to misuse of those features. The limitations can be easily remembered by a couple broad guidelines.

First off, the "in" and "out" contextual keywords can only be applied to interface and delegate generic type definitions. You can't create covariant or contravariant generic class definitions. Therefore, MyClass<in T> is illegal.

The other limitations apply when you attempt to treat a particular type parameter as covariant or contravariant. Covariance and contravariance only apply when there's a reference conversion between the two specific type parameters. As I mentioned earlier, IEnumerable<string> can be used where IEnumerable<object> is expected. There's a reference conversion from string to object. However, IEnumerable<int> can't be used where IEnumerable<long> is expected. There's a conversion from int to long, but that's a widening conversion, not a reference conversion. In addition, IEnumerable<int> cannot be used where IEnumerable<object> is expected. Again, there is a conversion from int to object, but it's a boxing conversion, not a reference conversion.

In practice, the reference conversion rule means that you can only treat different closed generic types as covariant or contravariant when the type parameters are both reference types, and are related by some inheritance relationship.

In order to support covariance and contravariance, the .NET 4.0 BCL will have several generic interfaces and delegate types updated to be safely covariant and contravariant. As you learn more about Visual Studio 2010, take the time to learn about how those language extensions on the interfaces enable you to express designs in less code, and reuse more logic safely.

What Can You Do Now?
At this point, you may be asking how this matters. After all, VS 2010 is still a future technology, and it will be some time before it will make its way to the corporate developer.

The question is how to author your code today such that it can easily take advantage of the new covariant and contravariant additions when they become available. Knowledge of the new features will help you create code that's ready to accept the in or out contextual keywords on your interface and delegate types. It will become more important to factor those interfaces into input only and output only portions, so that your interfaces can support both covariance and contravariance, as appropriate. You should examine your generic interfaces and methods to see if the parameters and return values are input safe or output safe. That will make it easier to use them in either a covariant or contravariant manner in the near future.

A more immediate need is to be able to emulate the covariant and contravariant features using the current language elements. You can't replicate all the features, because if it already worked, there's no reason for the language teams to add these features. That said, you can get close in some usages.

There are two techniques that you can often use to mitigate the need for covariance and contravariance. You can use Cast<T> or you can create generic methods instead of covariant and contravariant methods.

The earlier WriteItems() method could be modified as a generic method easily:

Now, you can call WriteItems() for any sequence, including a sequence of integers. In other uses, where your methods need capabilities beyond those methods in System.Object, you'll need to add constraints on the generic method, possibly even factoring out an interface contract as part of the generic method constraints. However, there will almost always be a way to create a generic method that can be used where you want to create a covariant method, or a contravariant method. When you write the method, you should convert the method to a generic method.

When you don't have access to the core method because it's in a third-party library, you can use the Cast<T> method in the specific cast where you need to convert between IEnumerable<T> types for two different type parameters. Of course, this can occur only where a conversion between those types exist.

Remember that the original generic WriteItems() method was coded this way:

The Cast<T>() method enumerates the input collection, converting each element, and yields the converted collection as its output. While this option will not work for other types, it will always work where you need an IEnumerable<T> conversion for different types.

In this article, I've shown you the motivation behind the addition of the generic covariance and contravariance in C# 4.0. They're being added because invariant generic types are too restrictive for most uses. There are covariant and contravariant conversions that we expect to work. In C# 3.0, those conversions don't work, to the surprise of many developers. The language team is addressing that in C# 4.0. In the meantime, there are ways to mitigate the need for those features.