Tribulations of CanBuildFrom

Tuesday 30 May 2017

Julien Richard-Foy

CanBuildFrom is probably the most
infamous abstraction of the current collections. It is mainly criticised for making scary type
signatures.

Our ongoing collections redesign (blog post, GitHub repo) is an opportunity
to try alternative designs. This post explains the (many!) problems solved by CanBuildFrom
and the alternative solutions implemented in the new collections.

Transforming the elements of a collection

It’s useful to think of String as a collection of Char elements: you can then use
the common collection operations like ++, find, etc. on String values.

However the map method is challenging because this one
transforms the Char elements into something that might or might not be Chars.
Then, what should be the return type of the map method on String values? Ideally,
we want to get back a String if we transform each Char into another Char, but we
want to get some Seq[B] if we transform each Char into a different type B. And this
is the way it currently works:

This feature is not limited to the map method: flatMap, collect, concat and a few
others also work the same. Moreover, String is not the only
collection type that needs this feature: BitSet
and Map are other examples.

The current collections rely on CanBuildFrom to implement this feature. The map
method is defined as follows:

When the implicit CanBuildFrom parameter is resolved it fixes the return type That.
The resolution is driven by the actual B type: if B is Char then That is fixed
to String, otherwise it is immutable.IndexedSeq.

The drawback of this solution is that the type signature of the map method looks cryptic.

In the new design we solve this problem by defining two overloads of the map
method: one that handles Char to Char transformations, and one that handles other
transformations. The type signatures of these map methods are straightforward:

defmap(f:Char=>Char):Stringdefmap[B](f:Char=>B):Seq[B]

Then, if you call map with a function that returns a Char, the first overload is
selected and you get a String. Otherwise, the second overload is selected and you
get a Seq[B]. Before Scala 2.12 such a solution would not have worked well: users
would have been required to explicitly write the type of the argument of the supplied
f function. In Scala 2.12 type inference has been improved so that it is not
anymore necessary.

Thus, we got rid of the cryptic method signatures while still supporting the feature
of returning a different type of result according to the type of the transformation function.

Collections’ type constructors with different arities

The collections are hierarchically organized. Essentially, the most generic collection
is Iterable[A], and then we have three main kinds of collections: Seq[A], Set[A]
and Map[K, V].

It is worth noting that Map[K, V] takes two type parameters (K and V) whereas the
other collection types take only one type parameter. This makes it difficult to
generically define, at the level of Iterable[A], operations that will
return a Map[K, V] when specialized.

For instance, consider again the case of the map method. We want to generically define
it on Iterable[A], but which return type should we use? When this method will
be inherited by List[A] we want its return type to be List[B], but when
it will be inherited by HashMap[K, V], we want its return type to be HashMap[L, W].
It is clear that we want to abstract over the type constructor of the concrete collections,
but the difficulty is that they don’t always take the same number of type parameters.

That’s a second problem solved by CanBuildFrom in the current collections.
Look again at the type signature of the (generic) map method on Iterable[A]:

defmap[B, That](f:A=>B)(implicitbf:CanBuildFrom[Repr, B, That]):That

The return type That is inferred from the resolved CanBuildFrom instance at call-site.
Both the Repr and B types actually drive the implicit resolution: when Repr is List[_]
the parameter That is fixed to List[B], and when Repr is HashMap[_, _] and B is a
tuple (K, V) then That is fixed to HashMap[K, V].

In the new design we solve this problem by defining two “branches” in the hierarchy:

And then the HashMap[K, V] concrete collection extends MapOps[K, V, HashMap] to set
its correct self-type constructor. Note that MapOps extends IterableOps: consequently it
inherits from its map method, which will be selected when the transformation function
passed to map does not return a tuple.

Sorted collections

The third challenge is about sorted collections (like TreeSet and TreeMap, for instance).
These collections define their order of iteration according to an ordering relationship for the
type of their elements.

As a consequence, when you transform the type of the elements (e.g. by using the – now familiar! –
map method), an implicit ordering instance for the new type of elements has to be available.

With CanBuildFrom, the solution relies (again) on the implicit resolution mechanism:
the implicit CanBuildFrom[TreeSet[_], X, TreeSet[X]] instance is available for some
type X only if an implicit Ordering[X] instance is also available.

In the new design we solve this problem by introducing a new branch in the hierarchy.
This one defines transformation operations that require an ordering instance for the element
type of the resulting collection:

traitSortedIterableOps[A, CC[_]]{defmap[B:Ordering](f:A=>B):CC[B]}

However, as mentioned in the previous section, we need to also abstract over the kind of the
type constructor of the concrete collections. Consequently we have in total four branches:

kind

not sorted

sorted

CC[_]

IterableOps

SortedIterableOps

CC[_, _]

MapOps

SortedMapOps

In summary, instead of having one map method that supports all the use cases described in
this section and the previous ones, we specialized the hierarchy to have overloads of
the map method, each one supporting a specific use case. The benefit is that the type
signatures immediately tell you the story: you don’t have to have a look at the actual
implicit resolution to know the result you will get from calling map.

Implicit builders

In the current collections, the fact that CanBuildFrom instances are available in the
implicit scope is useful to implement, separately from the collections, generic operations
that work with any collection type.

In the new design we are still experimenting with solutions to support these features. So far
the decision is to not put implicit builders in the collections implementation. We might
provide them as an optional dependency instead, but it seems that most of these use cases
could be supported even without implicit builders: you could just use an existing collection
instance and navigate through its companion object (providing the builder), or you could just
use the companion object directly to get a builder.

breakOut escape hatch

As we have previously seen, in the current collections when we want to transform some
collection into a new collection, we rely on an available implicit CanBuildFrom
instance to get a builder for the target collection. The implicit search is
driven by the type of the initial collection and the type of elements of the target
collection. The available implicit instances have been designed to make sense in the most
common cases.

However, sometimes this default behavior is not what you want. For instance, consider the
following program:

If you try to compile it you will get a compile error because the implicitly
resolved builder produces a List[(Int, Int)] instead of the desired Map[Int, Int].
We could convert this List[(Int, Int)] into a Map[Int, Int] but that
would be inefficient for large collections.