Three months ago, I proposed Stacked Borrows as a model for defining what kinds of aliasing
are allowed in Rust, and the idea of a validity invariant that has to be maintained by all code at
all times. Since then I have been busy implementing both of these, and
developed Stacked Borrows further in doing so. This post describes the latest
version of Stacked Borrows, and reports my findings from the implementation
phase: What worked, what did not, and what remains to be done. There will also
be an opportunity for you to help the effort!

Secondly, reading through a mutable reference is actually okay even when that reference is not exclusive .

Does LLVM noalias affect optimizations only when reading through pointers?

For example, if I have a &mut and a &, and I cast them to raw pointers and compare them for equality, I’d expect the comparison to be optimized away if the &mut is noalias because no other pointer in the current scope can alias that memory, and therefore, cannot be pointing to that memory. So two such pointers cannot be equal.

I made a small update: Dereferencing a pointer now always preserves the tag, but casting to a raw pointer resets the tag to Shr(None) . Box is treated like a mutable reference.

gnzlbg:

Does LLVM noalias affect optimizations only when reading through pointers?

No, it basically says operations on this pointer do not conflict with operations on other pointers. Since two reads never conflict, that gives a little more freedom for reads.

gnzlbg:

I’d expect the comparison to be optimized away

noalias says nothing about whether the pointers are equal, and cannot be used for optimizing equality tests. For example, in the following function, both arguments are marked noalias, yet clearly the pointers could be equal:

Three months ago, I proposed Stacked Borrows as a model for defining what kinds of aliasing are allowed in Rust, and the idea of a validity invariant that has to be maintained by all code at all times

Sorry if I’m asking something that’s maybe too obvious. What’s the idea of developing this model? Is it to prove correctness of the Rust borrow checker, or is the idea to construct a more ergonomic one? Maybe the idea is to provide an example correct-by-construction implementation of it?

What is it? I’m really curious

PS: for how little I know of memory management and aliasing, your post is quite readable. Thanks for taking the time to make it so!

I’m not directly involved in any of these efforts, but to check my own understanding…

The primary motivation is that Rust currently has no formal specification to tell us what is and isn’t legal unsafe code, i.e. what unsafe code triggers UB (undefined behavior) and what doesn’t. That’s a pretty big deal for all unsafe code in the ecosystem, and all the safe code that relies on it (which amounts to all Rust code ever). It’s arguably the only sense in which Rust is “less safe” than C++ today; in Rust it’s not even possible to know if your unsafe code is UB or not. So if “Stacked Borrows” turns out to be a good fit for all unsafe Rust code in the wild, we might have an RFC to formally adopt it as (a big part of) the official definition of UB in Rust.

A very close second motivation is that it seems like we can do way better than just adopting a formal spec. First, mere mortals like you and me can actually understand this model! (I still don’t have a clue whether any of my C++ code is technically UB) Second, even the compiler understands it, so it can programmatically verify the presence of absence of UB! There’s probably a ton of caveats on that (we obviously can’t check FFI, unlike the borrow checker you need very thorough test coverage, etc), but that would still be amazing.

Second, even the compiler understands it, so it can programmatically verify the presence of absence of UB!

The compiler doesn’t really, miri does, which is an interpreter based on the compiler. Think of it like one of these sanitizers (msan, asan etc) that exist for C/C++ code.

felix91gr:

So, it’s like the sibling of the borrow checker, the difference being that this one checks the safety (or non-UB quality of, at least) of unsafe code

Kind-of, yes, but there is one more very important difference: The borrow checker is static: you run it once on a piece of code, and you know that all ways of executing that code in any way (with any possible values for the arguments etc.) are safe. Stacked Borrows and its implementation in miri are dynamic: you can run your code with it, given concrete inputs and concrete values for all variables, and then it tells you whether your code is safe. To get the same guarantee as the static check (safety for all possible inputs), you would have to try all possible inputs, of which there are way too many. So the guarantee you get is weaker, but then on the plus side you can also use it for unsafe code.

Basically, for Stacked Borrows and miri to be useful, you better have a good test suite with excellent coverage. And even then you cannot be sure what happens when you try other inputs. The borrow checker doesn’t need any test suite, it can handle all inputs at once – but it only works on safe code.

Think of it like one of these sanitizers (msan, asan etc) that exist for C/C++ code.

Ohh, I get it now. A testing suite that runs miri with the Stacked Borrows system sounds like a really good idea!

RalfJung:

The borrow checker doesn’t need any test suite, it can handle all inputs at once

That reminds me of universal quantifiers in 1st-order logic (this would be the borrow checker) v/s propositional logic (this would be the stacked borrows system). Do both systems share a relationship like that one? Can one be expressed in terms of the other, in some sense?

The compiler doesn’t really, miri does, which is an interpreter based on the compiler. Think of it like one of these sanitizers (msan, asan etc) that exist for C/C++ code.

From what I understand, the sanitizers you mention aren’t interpreted and run as compiled, instrumented code. Could a sanitizer theoretically implement the stacked borrows model in a fully compiled, but instrumented, Rust program? I’m not sure if this would buy anything.

That reminds me of universal quantifiers in 1st-order logic (this would be the borrow checker) v/s propositional logic (this would be the stacked borrows system). Do both systems share a relationship like that one? Can one be expressed in terms of the other, in some sense?

It’s about a universal quantification, yes – the borrow checker checks "is this safe for all inputs, Stacked Borrows checks (defines, really) “is this safe for one particular given input”.

There is also a difference in precision though: The borrow checker will sometimes reject code that is actually safe for all inputs, because code can be too complicated for an automated check to figure out that it is always safe. That is the case for all the unsafe code wrapped in safe abstractions, actually (except for the ones that have bugs ).

skippy:

From what I understand, the sanitizers you mention aren’t interpreted and run as compiled, instrumented code. Could a sanitizer theoretically implement the stacked borrows model in a fully compiled, but instrumented, Rust program? I’m not sure if this would buy anything.

The only difference between interpretation and instrumented machine-code execution is the speed at which it runs. So yes, Stacked Borrows could also be implemented as instrumented compilation, and that would (a) be much faster than miri, and (b) allow FFI to C. It would however probably also be less precise, probably.

It’s about a universal quantification, yes – the borrow checker checks "is this safe for all inputs, Stacked Borrows checks (defines, really) “is this safe for one particular given input”.

I’m not entirely clear on which bits of unsafe code you’re targeting here. One question that springs to mind is: for relatively self-contained code, would it not be possible to go from “safe for a particular input” to “safe for all inputs” by symbolically executing the unsafe code (including the stacked borrows instrumentation) starting from memory and registers that are all symbolic?

AFAIU, this should work for unsafe code that does statically bounded pointer-chasing (though perhaps that’s not interesting?). Alternatively, (heap) shape analysis or significantly bounding the size of the symbolic memory (neither of which may be simple, of course) might help? Though again, don’t have any concrete examples of the type of code that you’d want to check.

would it not be possible to go from “safe for a particular input” to “safe for all inputs” by symbolically executing the unsafe code (including the stacked borrows instrumentation) starting from memory and registers that are all symbolic?

Sure, that is always the case. Once you have a definition “safe for some input”, you can define “for all inputs, safe for that input”, and you can try to prove that for some particular function. This will not always be possible automatically, but symbolic execution is one of many techniques that one could try to use and it will work in some cases. Once we have a better understanding of Stacked Borrows, I certainly hope such tools will be developed! (I have no experience with symbolic execution myself.)

Sure, that is always the case. Once you have a definition “safe for some input”, you can define “for all inputs, safe for that input”, and you can try to prove that for some particular function. This will not always be possible automatically, but symbolic execution is one of many techniques that one could try to use and it will work in some cases. Once we have a better understanding of Stacked Borrows, I certainly hope such tools will be developed! (I have no experience with symbolic execution myself.)

That’s great to hear. I only have (limited) experience with symbolic execution. Which of those other techniques would you consider semi-practical for this task?

I’d say we should throw everything we have at it. I am not an expert in automated analysis. Another approach might be abstract interpretation, or some kind of automated theorem proving, and then there are approaches that need some help from the user (which might be reasonable for critical code).

Really, you could occupy an entire swarm of grad students with projects to throw all verification methods known to man at this problem. The main challenge remains developing something that actually works on real-world code and can be meaningfully embedded into productive development (as opposed to a one-off verification effort that becomes meaningless with the next refactoring).