bad pun enthusiast, trying to learn to code

Yet Another Root of All Evil

It’s a great lecture. I’d recommend the whole thing. The main thrust of the talk is a theme that I’ve always had a little trouble with: performance. Performance is an odd topic in computer science: for something so obviously empirical, there’s an awful lot of (seeming) mysticism around some of the concepts. For beginners like me, it makes performance seem like an untouchable, too-difficult-for-you area.

Generally speaking, it is. Writing fast code is one of those things that takes experience. Algorithmic efficiency can be handled by a beginner: for many things, the perfect algorithm has already been discovered.

But performance is so much more complex than that. You can reason well about “how many steps” something takes, and then someone will whisper “cache miss” and you’re screwed.

So, in the lecture, linked lists got a lot of bashing. Head on down to 34:47 and you’ll see the quote on screen:

Discontiguous data structures are the root of all (performance) evil

Aww! I had thought the recursive lists were really cool! Apparently not. Apparently they’re cripplingly slow. (Airspeed Velocity also did a great post on this)

The devil lies here in the discontiguity. (I hope that’s a word) The data you’re using in your code can come from different stores: some are faster than others. As you can imagine, the faster the store, the smaller it is. Slowest of all is the hard drive / flash drive: 500GB+. Above that is the RAM, which is 5GB+. Above that is where the fun begins: the caches. These store a small block from the RAM, usually about 3MB. (Great paper on this stuff here)

So, if your data can all fit into the cache, you don’t have to ever go to RAM. Having to go to RAM is a “cache miss”. How slow is it? Here’s another video:

On top of that, though, array indexing is based off of contiguity. If you know the memory footprint of each element of an array, and where an array begins in memory, getting to the nth element is as easy as (memory footprint) * n + (array start). This should also make it obvious why chopping off the beginning of an array is expensive: the rest of the elements need to be shifted back by one.

So the benefits of this system are obvious: fast iteration, indexing, etc. But why do we use the start of an array as the basis for indexing? Why not the end? I couldn’t find an answer on google, but it’s probably easy enough to guess: it makes more sense intuitively. You’d have to wonder whether or not we operate on the beginning of arrays more than the end or not. I also may be missing something very important.

There’s nothing to stop me from indexing from the end, though! Let’s make another “List” kind of thing, but this time based on a reversed array:

struct ContiguousList<Element> {
private var contents: [Element]
}

There’s a struct in the standard library that may be especially relevant here: ContiguousArray

A fast, contiguously-stored array of Element.

Efficiency is equivalent to that of Array, unless Element is a class or @objc protocol type, in which case using ContiguousArray may be more efficient. Note, however, that ContiguousArray does not bridge to Objective-C. See Array, with which ContiguousArray shares most properties, for more detail.

The standard Array doesn’t promise to be completely contiguous. It uses sneakiness and cleverness to make mutation and access of elements reasonably efficient: it can decide when it’s efficient to be contiguous, or when it’s more efficient to break up. However, I’m all about keeping arrays together today so our list will look like this:

Then, if you just say: extension ContiguousList : SequenceType {}, you get an IndexingGenerator for free.

Then there’s more pretty standard stuff with regard to making it ArrayLiteralConvertible and so on, but we might want to provide our own implementations of certain functions, given that there may be a more efficient way to do it:

There’s an extra function in there, too: removeFirst(). This is because we’re efficiently able to do that – you could also add functions like prepend, dropFirst, etc.

So shall we use this for our Deque? It’s how I did the last one – two singly-linked lists, each starting at either end.

Except the semantics get a little strange. What we’re using as our lists is already a reversed array, so the back would be reversed twice. There’s really no need for the double negative and extra layer of abstraction. So our Deque will just contain two ContiguousArrays, with the front reversed:

There’s no didSet here, you might notice – we’re going to do all of that manually. We may know in some cases that we don’t need to check – in that case, we want to avoid the check() function.

I’ve decided to go a different way with that guy, this time. I wanted a little more abstraction: specifically, I wanted to separate the checking from the fixing. In this version of the Deque, I found myself testing a lot, so I wanted to be able to check that the Deque was still balanced after performing various functions on it.

In my tests for the old Deque I just wrote a small extension on Deque which returns a Boolean. A Boolean isn’t good enough for a check() function, though: if it says “not balanced”, your fix() still has to find out which side it’s unbalanced on.

So I went with an enum:

internal enum Balance {
case FrontEmpty, BackEmpty, Balanced
}

You’ll notice internal there. (fuller explanation here) My usual habit with access control is to make everything private, and then go back and change what I need to public. (In writing this post, the struct got oddly complex and large – I began to think an awful lot more about architecture-ish things. I realised that old strategies like that weren’t strategies at all) private is useful: it keeps things – shockingly – private. There are performance benefits, but also benefits to clarity.

Internal does something similar: it blocks access from things outside the framework/app, but it allows access from other files in the app. There’s one other effect: testing. If you mark a module as testable when you import it into your test cases, you can test internal variables and functions as much as you want, without exposing them.

So it all works rather nicely for the Deque. We can give it an internal variable describing its balance:

At any rate, there’s another newbie here: reserveCapacity(). While arrays in Swift seem like they do have unlimited capacity, they obviously can’t: there’s got to be some amount of space in memory laid aside for each. If you fill that space, Swift will allocate another block of memory, the same size as whatever you filled (so now you have double the previous amount). To avoid this, Swift will try and guess how much memory you’re going to need – if you use a function like filter(), for instance, you know that the filtered array can be no bigger than the sequence filtered from. Here, we know what size the reversed array is going to be – one less than the size of the other. I’m not sure if Swift reserves that capacity automatically – it certainly could: ReverseRandomAccessCollection – the type returned by reverse() – has a count property, which you could use. Then again, ContiguousArray only has an initialiser for SequenceTypes, which don’t have a count property. Confusing it more, they do have an underestimateCount() – maybe it uses this?

Next, let’s make it indexable. startIndex is 0, endIndex is the sum of front and back’s endIndexs. That’s all fine – but what about the subscript? First, we need to check which array the index is going into – idx < front.endIndex – and then, once we find which it is, you need to find the difference between the idx and endIndex:

Now we’ve got all of the basic conformances, so let’s overload some functions with more efficient implementations. As of beta 5, there are some new difficulties here. Remember Sliceable? Well, that left us in beta 4. It’s now wrapped up into CollectionType. 5 took it back a further step: it’s now wrapped up into SequenceType. Kind of. Take a look at the protocol:

There is way more going on here than before. Most importantly, there’s a new type alias: SubSequence. But we didn’t declare a SubSequence. Usually that’s fine: after all, we didn’t declare Generator. It was inferred from the generate() function. But we didn’t declare anything that returned Self.SubSequence – so where’s it inferring from? It gets it from the default implementation. The default implementation, as it happens, uses AnySequence.

Look! Laziness!

I don’t know why every single post I write talks about laziness. This entire post is supposed to be about a strict, contiguous data structure. Ah, well.

So where is it relevant here? Well, AnySequence is lazy, kind of. It’s a wrapper around a generator, and it only calls the next() method when needed. So if your generator is lazy, your AnySequence is lazy. Some of the methods above don’t seem well suited to laziness, at least at first glance. Let’s check which are with an anyGenerator that prints each step of evaluation:

Strangely, neither of these force evaluation. You could imagine an implementation that would jump over the first n elements, and then return an AnySequence: it seems like the Swift team have the jumping within the AnySequence.

nums.dropLast()
nums.dropLast(5)

Both of these do evaluate the whole sequence. It’s pretty clear why: in order to know which element is last, you need to walk along the whole sequence. However, it’s not impossible to perform these somewhat lazily: you could hold n elements ahead of the element you were returning, like this:

Although it’s only lazy in a very loose send of the word. It’s got to hold elements ahead of it, which is inefficient, and it’s got to perform an awful lot of removeFirst() on an array, which is also inefficient. Maybe if we had some other kind of data structure… one that had O(1) appending and removeFirst…

So the other methods are lazy as you’d expect them to be. Is this significant? Maybe. Another addition to the standard library in the new beta was a Ruby function: forEach. This one is controversial: it has one parameter, a closure, which it calls on each element of a sequence. A functional-style chained function that’s just… a side effect. It seems ugly. But sometimes side effects are elegant – seriously, I promise. Where it’s most obviously useful, though, is when it follows a long chain of methods.

This laziness is relevant when you read the warnings that come along with forEach:

You cannot use the break or continue statement to exit the current call of the body closure or skip subsequent calls.

The lazy prefix method, especially, allows you to break out of the loop-like patterns you might find yourself emulating with forEach.

Back to the Deque

What’s annoying about the new SequenceType is that since all of those functions are included in the first declaration of the protocol, they’re all requirements. It’s no problem if you’ve got a simple implementation – the issue arises when you want to overload the other functions. If you try and do your own dropFirst, you override the type inference for SubSequence. Now, the compiler will look for all the other methods that return the same type of subsequence. If you override one, you must override them all.

This small little Deque was shaping up to be a bit of a behemoth – so I decided to go all in! You probably know that an Array slices into an ArraySlice. This struct is just a view into the original array – in fact, on its first slice, it probably only contains the beginning and end index of the slice. In this way, it can keep from expensive operations like resizing and shifting around the original array in memory.

Being proper, ContiguousDeque should be done in the same way. Like this:

And then you can define all of the subsequent functions just once. But unfortunately, in Swift’s current version, protocol extensions can’t give you conformance to other protocols. Now, you could define each of the functions, and then have a couple lines declaring conformance on each individual implementation, but that relentlessly crashed Xcode for me.

What this means is lots and lots of code. I went a little overboard, because I wanted to have only homemade versions of each function, but even so, it was tough. Just SequenceType was 100 lines:

At the end of it all, I got my Deque. It was contiguous. It was cool. There were a couple stack-like functions included in the latest beta – popFirst() and popLast() – and they fit in well with everything else.

But there was a problem. I could not understand any of the code. Look at this function, for instance:

Well, the conditions for the switch is pretty ok. subRange.startIndex < front.endIndex will return true if the beginning of the subRange being replaced lies in the front array. The second one: subRange.endIndex <= front.endIndex tells you if the end of the end of the subRange is in the front array. So (true, true) means that the entire subRange is in the front, (true, false) means it spans both, and (false, false) means that it’s in the back array. But the last case? It’s correct, there’s no typo. That’s when the subrange is empty (it would look something like 4..<4), but it's directly between both arrays.

How about the defer { check() }? The reverse when inserting into the front? how to find the various ranges to insert? This code is absolutely lethal.

And I don’t think it’s necessarily my fault, either. I mean, it definitely is a little: I didn’t have to overload every method on the Deque. In fact, it probably would have been better to build up the methods from a smaller base: I would only have to do difficult indexing logic like that above once, and then call on it later on.

But the logic is difficult whatever way you look at it. Indexing is confusing, and off-by-one errors are annoying. So what do you do? You write tests!

Testing has gotten a little better in the new beta. You can do a @testable import. As of Swift 2.0, code coverage information was available.

Compile times are still too long, though. Uninformative errors are far too common, and abstraction efforts are rewarded with more crashes. But there’s no other option. I had no idea if my Deque would work, so I had to test.

Testing is one of those things that I always find myself doing, but never formally. I feel like I’m just not the kind of person to plan them. I’m easy-going, right?

Except that 400 lines later I have an acute awareness of the problem with that philosophy. The planning saves time, not the other way around. A formalised system has been formalised for a reason.

So I didn’t look up other peoples’ ways to do tests (that’s the next post), but I did decide on how I was going to do them. I realised that I wanted the Deque to act exactly as an array would, so I would have a function that would generate a sequence of equivalent arrays and Deques:

And you can stick in the middle whatever modification that you want to perform on both. Since the things you’d be testing are methods and properties, rather than free functions, and they’re different types, you can’t do something like this:

But it worked pretty well – I found myself rewriting an awful lot of my code, in order to pass them. I don’t think I’ll be able to trust the other things I write without this level of rigour.

One other point: I found it really helpful to get together some axioms or invariants that should be true for a given struct/function, regardless of what happens. That’s the “isBalanced” variable for deques, for instance. In some protocols, helpfully, there are axioms actually specified. RandomAccessIndexType, for instance, has axioms for its two methods:

advancedBy(_:)
Return self offset by n steps.

Return Value

If n > 0, the result of applying successor to selfn times. If n < 0, the result of applying predecessor to self n times. Otherwise, self.

You could probably write some kind of function that would run types conforming to certain protocols through a normal set of tests for those protocols. It might take another function that tested for an invariant on your type. This would make a lot of this stuff reusable, but that’s for another blog post at this stage.

The ContiguousDeque, ContiguousList, ContiguousDequeSlice, and ContiguousListSlice structs are all here on github, along with all of their tests.