Menu

bad pun enthusiast, trying to learn to code

Sequence Conformance

So you want to conform to SequenceType. Maybe you’re even more ambitious than that: CollectionType, MutableSliceable, RangeReplaceableCollectionType – you want it all. What’s involved?

SequenceType

This is the most basic kind of sequence: anything that conforms to it can be iterated with a for...in loop, and gets some cool methods like map, flatMap, reduce, etc. It has one basic requirement: a generate() method, which returns something that conforms to GeneratorType.

GeneratorType

Generators are stateful objects that represent iteration. They must have one method – next() – which returns an optional. This optional should be successive elements of the sequence, until the sequence is finished, when it should be nil. (the protocol only requires the generator to return nil once: calling next()after it has returned nil is not allowed) These can’t be iterated with a for...in loop. If we take the example of iterating through an array, it’s easy to see what’s going on:

You’ll actually see this with a lot of the standard library generators. Also, because of default methods on protocols, anything that conforms to GeneratorType can be made to conform to SequenceType with no extra code:

extension SomeGenerator : SequenceType {}

Swift will fill in the method for you.

AnyGenerator

Declaring a new type with all of the conformances and whatnot needed can be a bit of a pain, so Swift has a few helpers to make generators and sequences a bit easier.

The first is a class called AnyGenerator. It’s worth noting that it’s a class, rather than a struct: since generators are stateful and reference-y, a class is a better semantic fit. (However, the std lib generators are all structs: this is because a struct allows for much faster iteration. It’s worth keeping in mind if you’re defining your own sequence where performance may be important)

Making your own generator with AnyGenerator is easy, though. You can wrap it around some other generator, or you can use a closure that represents the next() method:

But now we get an error: type 'List' does not conform to protocol 'SequenceType'. Because we’ve overridden one default protocol method, we lose all the others. Swift was expecting several methods for slicing, including dropFirst, dropLast, prefix, etc. From those, it would infer the SubSequence typealias. When we were using the default methods, the type inferred was AnySequence. Not anymore. Now it’s List, so when Swift looks for the other slicing methods that return a list, it can’t find them, and it tells you you’re not conforming to the protocol. This means that if you want to implement one, you’ve got to implement them all. Ugh:

ForwardIndexType is the base protocol for all indices – BidirectionalIndexType and RandomAccessIndexType both conform. All it needs is to be Equatable, and have a successor() method. The variables and subscript can give us a clue as to how IndexingGenerator might work:

This also sheds a little light on why Swift does “one-past-the-end” indexing. (the endIndex variable represents the index after the last index, confusingly) Since indices don’t have to be comparable, you couldn’t write IndexingGenerator like this:

So you’re stuck with only being able to use ==. If endIndex was the last index, then you’d only be able to tell that you were on the last element, not that you had passed it. Now, you could use flags (a finished boolean, or something), but that’s less efficient.

There’s one more interesting aspect of IndexingGenerator: it’s used everywhere. Arrays, ContiguousArrays, ArraySlices: they all use IndexingGenerator. That’s kind of interesting: you’d think that since SequenceType is the base protocol, the iteration is handled first, and the indexing is built on top of that. For the array types, at least, it’s the other way around: iteration is built on indexing. You get some interesting performance characteristics as a result.

Here’s an example. enumerate() might not work the way you’d expect it to: it doesn’t return a tuple of (index, value) for a given collection. Instead, it returns a tuple of (Int, value): that means that if you enumerated over a String, for instance, you couldn’t actually index back into the string with the first member of the tuple:

First one must be faster, right? I mean, the second one has an index lookup, which must be slow. All the first one does is increment each index, and then increment each element.

But it’s not. The second method is about twice the speed. Why? Well, when you iterate through self, you’re using an IndexingGenerator. It’s not just the array types, either: String acts like this, too. That’s really surprising: surely String has really slow indexing? What with all the unicode-correctness?

It looks like the index for String contains information about the memory location of the character itself. (or information about the character itself) So index is far more than just an indication of how many characters into the string you are – it’s very closely tied to the actual string itself.

This is important with regards to the standard library algorithms. Methods and functions on CollectionTypes expect that an index is the fastest way to get to any individual element. That means that making List indexable, like this:

The number of functions available to you here really gets to silly levels – but you only need three of the basic methods implemented: replaceRange, init(), and reserveCapacity: a function that can replace an arbitrary range of elements with an arbitrary number of new elements, an initialiser that can initialise an empty instance, and… reserveCapacity.

reserveCapacity

So this guy is interesting. Arrays in Swift aren’t necessarily contiguous (unless you use ContiguousArray), but they’re mostly contiguous. For this reason, it often makes sense to allocate the space in memory for you to fit the array into – especially if you have a rough idea of how big the array is going to be. For instance, the method map() will always produce an array the same length as the sequence mapped over. So map() should use reserveCapacity before it fills the array. That information is difficult to get, though. For one, map() works on SequenceTypes, not just CollectionTypes. SequenceType doesn’t have a count property – so how can you know the length of the sequence? Even if it did, the count property on CollectionTypes returns Index.Distance, whereas for reserveCapacity on an array, you need an Int.

underestimateCount()

This is a method on all SequenceTypes. It’s kind of a stand-in for count It has one requirement: it must be nondestructive. (you can tell they’re serious: “nondestructive” is in bold in the docs. The only other bold words are scary important-sounding things like “raw memory”, and “Equality implies substitutability”) The default returns 0. However, several of the std lib SequenceTypes are able to return the actual number of elements they contain: if you’re writing a method on SequenceType, or something that conforms to SequenceType, bear this in mind.

Laziness

There’s a little bit more to this in the standard library than you might think. The standard SequenceTypes like Array, Set, and Dictionary are all strict. Then there are those that have lazy in the name: LazySequence, LazyForwardCollection, LazyBidirectionalCollection, and LazyRandomAccessCollection. These are lazy, obviously. But there’s a third group: sequences that are lazy, but are treated as if they’re strict: AnySequence, Zip2Sequence, and EnumerateSequence, among others. (when I say “treated” I mean that they get the strict versions of map and filter)

Why have strict, explicitly lazy, and implicitly lazy? Well, the implicitly lazy sequences above are all built out of some underlying sequence. They’re really just a wrapper. For efficiency’s sake, they won’t force evaluation on the underlying sequence: but for simplicity’s sake, they get the standard map and filter.

What’s the use of this laziness? Well, if you want to chain sequence methods together, as is the Swifty way, the strict versions will kill your performance. Every method will allocate, fill, and destroy an array. However, if your methods are lazy, there are no intermediate arrays created, and you can decide when you want to allocate and fill. (David Owens had a great post on this)

Great! So how do I go about using all of this lazy goodness? Well, for any of the std lib structs, just call the lazy function on them, and then all subsequent map and filter methods will be piped. However, LazySequence is a struct, not a protocol (yet…), so if you want your own custom fancy sequence to get the lazy methods, you’ll have to wrap it in LazySequence. (or one of the other lazy structs, where applicable)

If you want to write a method on lazy sequences, there are currently 2 options available to you:

You can write 4 versions of your method, one for each lazy sequence struct in the std lib.

Define a new protocol, LazySequenceType, and have the std lib structs conform, like this: