Iterators yielding mutable references, take 2

Oct 24, 2013

OK, after writing the
post on iterators that yield mutable references, and discussing
with some folk on IRC, I remembered something I had forgotten. There
is actually a way to phrase the mutable vector iterator differently
such that it is safe. Actually, the end result still has some unsafe
code, but it takes the form of a simple helper function, and it’s
quite plausible to imagine that code becoming safe eventually. Even
better, the approach generalizes to other data structures.

Rephrasing the mutable vector iterator

Here is another implementation of an iterator over mutable references.
It works differently than the previous one. Rather than tracking an
index into a slice, it works by keeping an ever-shrinking slice. Each
time we invoke next(), the slice self.data is rewritten to exclude
the value that was just returned.

struct VecMutIterator<'v, T> {
data: &'v mut [T],
}
impl<'v, T> Iterator<&'v mut T> for VecMutIterator<'v, T> {
fn next<'n>(&'n mut self) -> Option<&'v mut T> {
pop_mut_ref(&mut self.data)
}
}
fn pop_mut_ref<'v, T>(data: &mut &'v mut [T]) -> &'v mut T {
// Get a pointer to the 0th element. Borrow checker
// would limit the lifetime of this pointer to the current
// fn body, for the reasons discussed in my previous post,
// but we can cheat and extend it further.
//
// This temporarily creates an unsafe situation, since there
// are two paths to the same data: `result` and `(*data)[0]`
let result = unsafe {
unsafe::copy_mut_lifetime(*data, &mut (*data)[0])
};
// Now adjust `*data` in place so that it no longer includes
// the 0th element. Now there is no more overlap.
*data = data.slice_from(1);
result
}

As you can see, you do still require an unsafe keyword to implement
the iterator. However, the scope of the unsafety is much more limited:
the invariant that there be only one mut pointer to a given piece of
memory is restore when next() returns, and hence we do not need to
concern ourselves with privacy. This is the kind of unsafe fn I prefer.

Interestingly, this “mutate in place” approach is precisely what the
actual vector iterators do – but they do it using raw, unsafe
pointers. This is primarily for performance. But it’s nice to see that
there is a safe – or almost, almost safe – equivalent in plain old
Rust.

Extending the approach

What’s kind of funny is that I thought of this approach a while back
when we were initially discussing iterators. In that context,
I was looking for a safe way to implement mutable iterators on
trees. The post includes a code example, but it is old and actually
expressed in terms of a MutIterator trait, so I wanted to generalize
it here to the Iterator trait we really use.

We need to maintain the the invariant that there is never more than
one pointer to the same data. The way we do this is that each time we
pop a node off the stack, we immediately decompose it into three
distinct pointers (value, left, and right), none of which
overlap. Now we can push left and right back onto the stack for
the future and return value. So the code operates on the same
principle as the iterator from the first section: destructively mutate
the iterator in place so that it no longer contains a way to reach
the value you just returned.

This is a pre-order iterator. With a little bit of effort, it should
be possible to adapt the approach to do a (safe) post-order iterator,
but I leave that as an exercise to the reader.

In case you’re curious why this iterator can be implemented
safely, the answer is that the unsafety is contained in vec::pop(),
which – similarly to pop_mut_ref() from before – atomically
removes an item and adjusts the vector to not contain it.

Side note: It is not necessary to use the pattern matching let
assignment to break apart the current node into its three fields. You
could rewrite the code above and replace references to left with
current.left, right with current.right, and so on. The borrow
checker is smart enough to handle paths like this as long as they are
contained within a single function. However, I thought the code read
more clearly if we did the decomposition into three parts in one
atomic step.

Conclusions

With respect to the question I posed in my previous post, this
pushes me squarely into the “keep iterators the way they are” camp.