2015-01-14T16:19:17-05:00http://smallcultfollowing.com/babysteps/Octopress2015-01-14T14:03:45-05:00http://smallcultfollowing.com/babysteps/blog/2015/01/14/little-orphan-implsWe’ve recently been doing a lot of work on Rust’s orphan rules,
which are an important part of our system for guaranteeing trait
coherence. The idea of trait coherence is that, given a trait and
some set of types for its type parameters, there should be exactly one
impl that applies. So if we think of the trait Show, we want to
guarantee that if we have a trait reference like MyType : Show, we
can uniquely identify a particular impl. (The alternative to coherence
is to have some way for users to identify which impls are in scope at
any time. It has its own complications; if you’re curious for
more background on why we use coherence, you might find this
rust-dev thread from a while back to be interesting
reading.)

The role of the orphan rules in particular is basically to prevent
you from implementing external traits for external types. So
continuing our simple example of Show, if you are defining your own
library, you could not implement Show for Vec<T>, because both
Show and Vec are defined in the standard library. But you can
implement Show for MyType, because you defined MyType. However,
if you define your own trait MyTrait, then you can implement
MyTrait for any type you like, including external types like
Vec<T>. To this end, the orphan rule intuitively says “either the
trait must be local or the self-type must be local”.

More precisely, the orphan rules are targeting the case of two
“cousin” crates. By cousins I mean that the crates share a common
ancestor (i.e., they link to a common library crate). This would be
libstd, if nothing else. That ancestor defines some trait. Both of the
crates are implementing this common trait using their own local types
(and possibly types from ancestor crates, which may or may not be in
common). But neither crate is an ancestor of the other: if they were,
the problem is much easier, because the descendant crate can see the
impls from the ancestor crate.

When we extended the trait system to supportmultidispatch, I confess that I originally didn’t give the
orphan rules much thought. It seemed like it would be straightforward
to adapt them. Boy was I wrong! (And, I think, our original rules were
kind of unsound to begin with.)

The purpose of this post is to lay out the current state of my
thinking on these rules. It sketches out a number of variations and
possible rules and tries to elaborate on the limitations of each
one. It is intended to serve as the seed for a discussion in the
Rust discusstion forums.

The first, totally wrong, attempt

The first attempt at the orphan rules was just to say that an impl is
legal if a local type appears somewhere. So, for example, suppose that I
define a type MyBigInt and I want to make it addable to integers:

12

implAdd<i32>forMyBigInt{...}implAdd<MyBigInt>fori32{...}

Under these rules, these two impls are perfectly legal, because
MyBigInt is local to the current crate. However, the rules also
permit an impl like this one:

1

impl<T>Add<T>forMyBigInt{...}

Now the problems arise because those same rules also permit an impl
like this one (in another crate):

1

impl<T>Add<YourBigInt>forT{...}

Now we have a problem because both impls are applicable to
Add<YourBigInt> for MyBigInt.

In fact, we don’t need multidispatch to have this problem. The same
situation can arise with Show and tuples:

(In fact, multidispatch is really nothing than a compiler-supported
version of implementing a trait for a tuple.)

The root of the problem here lies in our definition of “local”, which
completely ignored type parameters. Because type parameters can be
instantiated to arbitrary types, they are obviously special, and must
be considered carefully.

The ordered rule

This problem was first brought to our attention by arielb1, who
filed Issue 19470. To resolve it, he proposed a rule that I
will call the ordered rule. The ordered rule goes like this:

Write out all the type parameters to the trait, starting with Self.

The name of some local struct or enum must appear on that line before the first
type parameter.

More formally: When visiting the types in pre-order, a local type must be visited
before any type parameter.

In terms of the examples I gave above, this rule permits the following impls:

However, it avoids the quandry we saw before because it rejects this impl:

1

impl<T>Add<YourBigInt>forT{...}

This is because, if we wrote out the type parameters in a list, we would get:

1

T,YourBigInt

and, as you can see, T comes first.

This rule is actually pretty good. It meets most of the requirements
I’m going to unearth. But it has some problems. The first is that it
feels strange; it feels like you should be able to reorder the type
parameters on a trait without breaking everything (we will see that
this is not, in fact, obviously true, but it was certainly my first
reaction).

Another problem is that the rule is kind of fragile. It can easily
reject impls that don’t seem particularly different from impls that it
accepts. For example, consider the case of the Modifier trait
that is used in hyper and iron. As you can see in this issue,
iron wants to be able to define a Modifier impl like the following:

123

structResponse;...implModifier<Response>forVec<u8>{..}

This impl is accepted by the ordered rule (thre are no type parameters at all,
in fact). However, the following impl, which seems very similar and equally
likely (in the abstract), would not be accepted:

123

structResponse;...impl<T>Modifier<Response>forVec<T>{..}

This is because the type parameter T appears before the local type
(Response). Hmm. It doesn’t really matter if T appears in the local type,
either; the following would also be rejected:

123

structMyHeader<T>{..}...impl<T>Modifier<MyHeader<T>>forVec<T>{..}

Another trait that couldn’t be handled properly is the BorrowFrom trait
in the standard library. There a number of impls like this one:

1

impl<T>BorrowFrom<Rc<T>>forT

This impl fails the ordered check because T comes first. We can make
it pass by switching the order of the parameters, so that the
BorrowFrom trait becomes Borrow.

A final “near-miss” occurred in the standard library with the Cow
type. Here is an impl from libcollections of FromIterator for a
copy-on-write vector:

1

impl<'a,T>FromIterator<T>forCow<'a,Vec<T>,[T]>

Note that Vec is a local type here. This impl obeys the ordered
rule, but somewhat by accident. If the type parameters of the Cow
trait were in a different order, it would not, because then [T]
would precede Vec<T>.

The covered rule

In response to these shortcomings, I proposed an alternative rule that
I’ll call the covered rule. The idea of the covered rule was to say
that (1) the impl must have a local type somewhere and (2) a type
parameter can only appear in the impl if the type parameter is
covered by a local type. Covered means that it appears “inside” the
type: so T is covered by MyVec in the type MyVec<T> or
MyBox<Box<T>>, but not in (T, MyVec<int>). This rule has the
advantage of having nothing to do with ordering and it has a certain
intution to it; any type parameters that appear in your impls have to
be tied to something local.

The reason is that the type parameter T here is covered by the
(local) type Rc.

However, after implementing this rule, we found out that it actually
prohibits a lot of other useful patterns. The most important of them is
the so-called auxiliary pattern, in which a trait takes a type parameter
that is a kind of “configuration” and is basically orthogonal to the types
that the trait is implemented for. An example is the Hash trait:

1

impl<H>Hash<H>forMyStruct

The type H here represents the hashing function that is being used. As you can imagine,
for most types, they will work with any hashing function. Sadly, this impl is rejected,
because H is not covered by any local type. You could make it work by adding a parameter
H to MyStruct:

1

impl<H>Hash<H>forMyStruct<H>

But that is very weird, because now when we create our struct we are
also deciding which hash functions can be used with it. You can also
make it work by moving the hash function parameter H to the hash
method itself, but then that is limiting. It makes the Hash trait
not object safe, for one thing, and it also prohibits us from writing
types that are specialized to particular hash functions.

Another similar example is indexing. Many people want to make types indexable
by any integer-like thing, for example:

123

impl<I:Int,T>Index<I>forVec<T>{typeOutput=T;}

Here the type parameter I is also uncovered.

Ordered vs Covered

By now I’ve probably lost you in the ins and outs, so let’s see a
summary. Here’s a table of all the examples I’ve covered so far. I’ve
tweaked the names so that, in all cases, any type that begins with
My is considered local to the current crate:

As you can see, both of these have their advantages. However, the
ordered rule comes out somewhat ahead. In particular, the places where
it fails can often be worked around by reordering parameters, but
there is no answer that permits the covered rule to handle the
Hash example (and there are a number of other traits that fit that
pattern in the standard library).

Hybrid approach #1: Covered self

You might be wondering – if neither rule is perfect, is there a way
to combine them? In fact, the rule that is current implemented is such
a hybrid. It imposes the covered rules, but only on the Self
parameter. That means that there must be a local type somewhere in
Self, and any type parameters appearing in Self must be covered by
a local type. Let’s call this hybrid CS, for “covered apply to
Self”.

As you can see, the CS hybrid turns out to miss some important cases that the
pure ordered full achieves. Notably, it prohibits:

impl Add<MyBigInt> for i32

impl Modifier<MyType> for Vec<u8>

This is not really good enough.

Hybrid approach #2: Covered First

We can improve the covered self approach by saying that some type
parameter of the trait must meet the rules (some local type; impl type
params covered by a local type), but not necessarily Self. Any type parameters
which precede this covered parameter must consist exclusively of remote types (no impl
type parameters, in particular).

One disappointment about the hybrid rules I presented thus far is that
they are inherently ordered. It runs somewhat against my intuition,
which is that the order of the trait type parameters shouldn’t matter
that much. In particular it feels that, for a commutative trait like
Add, the role of the left-hand-side type (Self) and
right-hand-side type should be interchangable (below, I will argue
that in fact some kind of order may well be essential to the notion of
coherence as a whole, but for now let’s assume we want Add to treat
the left- and right-hand-side as equivalent).

However, there are definitely other traits where the parameters are
not equivalent. Consider the Hash trait example we saw before. In
the case of Hash, the type parameter H refers to the hashing
algorithm and thus is inherently not going to be covered by the type
of the value being hashed. It is in some sense completely orthogonal
to the Self type. For this reason, we’d like to define impls that
apply to any hasher, like this one:

1

impl<H>Hash<H>forMyType{...}

The problem is, if we permit this impl, then we can’t allow another
crate to define an impl with the same parameters, but in a different
order:

1

impl<H>Hash<MyType>forH{...}

One way to permit the first impl and not the second without invoking
ordering is to classify type parameters as self-like and auxiliary.

The orphan rule would require that at least one self-like parameter
references a local type and that all impl type parameters appearing in
self-like types would be covered. The Self type is always self-like,
but other types would be auxiliary unless declared to be self-like (or
perhaps the default would be the opposite).

Here is a table showing how this new “explicit” rule would work,
presuming that the type parameters on Add and Modifier were
declared as self-like. The Hash and Index parameters would be
declared as auxiliary.

You can see that it’s quite expressive, though it is very restrictive
about generic impls for Add. However, it would push quite a bit of
complexity onto the users, because now when you create a trait, you
must classify its type parameter as self.

In defense of ordering

Whereas at first I felt that having the rules take ordering into
account was unnatural, I have come to feel that ordering is, to some
extent, inherent in coherence. To see what I mean, let’s consider an
example of a new vector type, MyVec<T>. It might be reasonable to
permit MyVec<T> to be addable to anything can converted into an
iterator over T elements. Naturally, since we’re overloading +,
we’d prefer for it to be commutative:

The problem is that these three impls are inherently
overlapping. After all, if I try to add two MyVec instances, which
impl do I get?

Now, this isn’t a problem for any of the rules I proposed in this
thread, because all of them reject that pair of impls. In fact, both
the “Covered” and “Explicit Declarations” rules go farther: they
reject both impls. This is because the type parameter I is
uncovered; since the rules don’t consider ordering, they can’t allow
an uncovered iterator I on either the left- or the right-hand-side.

The other variations (“Ordered”, “Covered Self”, and “Covered First”),
on the other hand, allow only one of those impls: the one where
MyVec<T> appears on the left. This seems pretty reasonable. After
all, if we allow you to define an overloaded + that applies to an
open-ended set of types (those that are iterable), there is the
possibility that others will do the same. And if I try to add a
MyVec<int> and a YourVec<int>, both of which are iterable, who
wins? The ordered rules give a clear answer: the left-hand-side wins.

There are other blanket cases that also get prohibited which might on their
face seem to be reasonable. For example, if I have a BigInt type, the ordered
rules allow me to write impls that permit BigInt to be added to any concrete
int type, no matter which side that concrete type appears on:

Now, this makes some measure of sense because Int is a trait that is
only intended to be implemented for the primitive integers. In
principle all bigints could use these same rules without conflict, so
long as none of them implement Int. But in fact, nothing prevents
them from implementing Int. Moreover, it’s not hard to imagine
other crates creating comparable impls that would overlap with the
ones above:

In the future, it may be interesting to provide a way to use traits to
create “strata” so that we can say things like “it’s ok to use an
Int-bounded type parameter on the LHS so long as the RHS is bounded
by Foo, which is incompatible with Int”, but it’s a subtle and
tricky issue (as the Show example demonstrates).

So ordering basically means that when you define your traits, you
should put the “principal” type as Self, and then order the other
type parameters such that those which define the more “principal”
behavior come afterwards in order.

The problem with ordering

Currently I lean towards the “Covered First” rule, but it bothers me
that it allows something like

1

implModifier<MyType>forVec<u8>

but not

1

impl<T>Modifier<MyType>forVec<T>

However, this limitation seems to be pretty inherent to any rules that
do not explicitly identify “auxiliary” type parameters. The reason is
that the ordering variations all use the first occurrence of a local
type as a “signal” that auxiliary type parameters should be permitted
afterwards. This implies that another crate will be able to do
something like:

1

impl<U>Modifier<U>forVec<YourType>

In that case, both impls apply to Modifier<MyType> for Vec<YourType>.

Conclusion

This is a long post, and it covers a lot of ground. As I wrote in the
introduction, the orphan rules turn out to be hiding quite a lot of
complexity. Much more than I imagined at first. My goal here is mostly
to lay out all the things that aturon and I have been talking about in
a comprehensive way.

I feel like this all comes down to a key question: how do we identify
the “auxiliary” input type parameters? Ordering-based rules identify
this for each impl based on where the first “local” type
appears. Coverage-based rules seem to require some sort of explicit
declaration on the trait.

I am deeply concerned about asking people to understand this
“auxiliary” vs “self-like” distinction when declaring a trait. On the
other hand, there is no silver bullet: under ordering-based rules,
they will be required to sometimes reorder their type parameters just
to pacify the seemingly random ordering rule. (But I have the feeling
that people intuitively put the most “primary” type first, as Self,
and the auxiliary type parameters later.)

]]>2014-11-26T16:58:56-05:00http://smallcultfollowing.com/babysteps/blog/2014/11/26/purging-procThe so-called “unboxed closure” implementation in Rust has reached the
point where it is time to start using it in the standard library. As
a starting point, I have a
pull request that removes proc from the language. I started
on this because I thought it’d be easier than replacing closures, but
it turns out that there are a few subtle points to this transition.

I am writing this blog post to explain what changes are in store and
give guidance on how people can port existing code to stop using
proc. This post is basically targeted Rust devs who want to adapt
existing code, though it also covers the closure design in general.

To some extent, the advice in this post is a snapshot of the current
Rust master. Some of it is specifically targeting temporary
limitations in the compiler that we aim to lift by 1.0 or shortly
thereafter. I have tried to mention when that is the case.

The new closure design in a nutshell

For those who haven’t been following, Rust is moving to a powerful new
closure design (sometimes called unboxed closures). This part of the
post covers the highlight of the new design. If you’re already
familiar, you may wish to skip ahead to the “Transitioning away from
proc” section.

The basic idea of the new design is to unify closures and traits. The
first part of the design is that function calls become an overloadable
operator. There are three possible traits that one can use to overload
():

As you can see, these traits differ only in their “self” parameter.
In fact, they correspond directly to the three “modes” of Rust
operation:

The Fn trait is analogous to a “shared reference” – it means that
the closure can be aliased and called freely, but in turn the
closure cannot mutate its environment.

The FnMut trait is analogous to a “mutable reference” – it means
that the closure cannot be aliased, but in turn the closure is
permitted to mutate its environment. This is how || closures work
in the language today.

The FnOnce trait is analogous to “ownership” – it means that the
closure can only be called once. This allows the closure to move out
of its environment. This is how proc closures work today.

Enabling static dispatch

One downside of the older Rust closure design is that closures and
procs always implied virtual dispatch. In the case of procs, there was
also an implied allocation. By using traits, the newer design allows
the user to choose between static and virtual dispatch. Generic types
use static dispatch but require monomorphization, and object types use
dynamic dispatch and hence avoid monomorphization and grant somewhat
more flexibility.

As an example, whereas before I might write a function that takes a
closure argument as follows:

Note that we write the type parameters to FnMut using parentheses
syntax (FnMut(&String) -> uint). This is a convenient syntactic
sugar that winds up mapping to a traditional trait reference
(currently, for<'a> FnMut<(&'a String,), uint>). At the moment,
though, you are required to use the parentheses form, because we
wish to retain the liberty to change precisely how the Fn trait type
parameters work.

A caller of foo() might write:

12

letsome_salt:String=...;foo(|str|myhashfn(str.as_slice(),&some_salt))

You can see that the || expression still denotes a closure. In fact,
the best way to think of it is that a || expression generates a
fresh structure that has one field for each of the variables it
touches. It is as if the user wrote:

Using object types to get virtual dispatch

The downside of using generic type parameters for closures is that you
will get a distinct copy of the fn being called for every
callsite. This is a great boon to inlining (at least sometimes), but
it can also lead to a lot of code bloat. It’s also often just not
practical: many times we want to combine different kinds of closures
together into a single vector. None of these concerns are specific to
closures. The same things arise when using traits in general. The nice
thing about the new closure design is that it lets us use the same
tool – object types – in both cases.

If I wanted to write my foo() function to avoid monomorphization,
I might change it from:

123

fnfoo<F>(hashfn:F)whereF:FnMut(&String)->uint{...}

to:

12

fnfoo(hashfn:&mutFnMut(&String)->uint){{...}

Note that the argument is now a &mut FnMut(&String) -> uint, rather
than being of some type F where F : FnMut(&String) -> uint.

One downside of changing the signature of foo() as I showed is that
the caller has to change as well. Instead of writing:

1

foo(|str|...)

the caller must now write:

1

foo(&mut|str|...)

Therefore, what I expect to be a very common pattern is to have a
“wrapper” that is generic which calls into a non-generic inner function:

This way, the caller does not have to change, and only this outer
wrapper is monomorphized, and it will likely be inlined away, and the
“guts” of the function remain using virtual dispatch.

In the future, I’d like to make it possible to pass object types (and other
“unsized” types) by value, so that one could write a function that just
takes a FnMut() and not a &mut FnMut():

12

fnfoo(hashfn:FnMut(&String)->uint){{...}

Among other things, this makes it possible to transition simply
between static and virtual dispatch without altering callers and
without creating a wrapper fn. However, it would compile down to
roughly the same thing as the wrapper fn in the end, though with
guaranteed inlining. This change requires somewhat more design and
will almost surely not occur by 1.0, however.

Specifying the closure type explicitly

We just said that every closure expression like || expr generates a
fresh type that implements one of the three traits (Fn, FnMut, or
FnOnce). But how does the compiler decide which of the three traits
to use?

Currently, the compiler is able to do this inference based on the
surrouding context – basically, the closure was an argument to a
function, and that function requested a specific kind of closure, so
the compiler assumes that’s the one you want. (In our example, the
function foo() required an argument of type F where F implements
FnMut.) In the future, I hope to improve the inference to a more
general scheme.

Because the current inference scheme is limited, you will sometimes
need to specify which of the three fn traits you want
explicitly. (Some people also just prefer to do that.) The current
syntax is to use a leading &:, &mut:, or :, kind of like an
“anonymous parameter”:

123456789101112131415

// Explicitly create a `Fn` closure which cannot mutate its// environment. Even though `foo()` requested `FnMut`, this closure// can still be used, because a `Fn` closure is more general// than `FnMut`.foo(|&:|{...})// Explicitly create a `FnMut` closure. This is what the// inference would select anyway.foo(|&mut:|{...})// Explicitly create a `FnOnce` closure. This would yield an// error, because `foo` requires a closure it can call multiple// times in a row, but it is being given a closure that can be// called exactly once.foo(|:|{...})// (ERROR)

The main time you need to use an explicit fn type annotation is when
there is no context. For example, if you were just to create a closure
and assign it to a local variable, then a fn type annotation is
required:

1

letc=|&mut:|{...};

Caveat: It is still possible we’ll change the &:/&mut:/:
syntax before 1.0; if we can improve inference enough, we might even
get rid of it altogether.

Moving vs non-moving closures

There is one final aspect of closures that is worth covering. We gave the
example of a closure |str| myhashfn(str.as_slice(), &some_salt)
that expands to something like:

123

structClosureEnvironment<'env>{some_salt:&'envString}

Note that the variable some_salt that is used from the surrounding
environment is borrowed (that is, the struct stores a reference to
the string, not the string itself). This is frequently what you want,
because it means that the closure just references things from the
enclosing stack frame. This also allows closures to modify local
variables in place.

However, capturing upvars by reference has the downside that the
closure is tied to the stack frame that created it. This is a problem
if you would like to return the closure, or use it to spawn another
thread, etc.

For this reason, closures can also take ownership of the things that
they close over. This is indicated by using the move keyword before
the closure itself (because the closure “moves” things out of the
surrounding environment and into the closure). Hence if we change
that same closure expression we saw before to use move:

1

move|str|myhashfn(str.as_slice(),&some_salt)

then it would generate a closure type where the some_salt variable
is owned, rather than being a reference:

123

structClosureEnvironment{some_salt:String}

This is the same behavior that proc has. Hence, whenever we replace
a proc expression, we generally want a moving closure.

Currently we never infer whether a closure should be move or not.
In the future, we may be able to infer the move keyword in some
cases, but it will never be 100% (specifically, it should be possible
to infer that the closure passed to spawn should always take
ownership of its environment, since it must meet the 'static bound,
which is not possible any other way).

Transitioning away from proc

This section covers what you need to do to modify code that was using
proc so that it works once proc is removed.

Transitioning away from proc for library users

For users of the standard library, the transition away from proc is
fairly straightforward. Mostly it means that code which used to write
proc() { ... } to create a “procedure” should now use move|| {
... }, to create a “moving closure”. The idea of a moving closure
is that it is a closure which takes ownership of the variables in its
environment. (Eventually, we expect to be able to infer whether or not
a closure must be moving in many, though not all, cases, but for now
you must write it explicitly.)

Hence converting calls to libstd APIs is mostly a matter of
search-and-replace:

In that case, if you simply write move||, you will get some strange errors:

1

letx=move||{...};

The problem is that, as discussed before, the compiler needs context
to determine what sort of closure you want (that is, Fn vs FnMut
vs FnOnce). Therefore it is necessary to explicitly declare the sort
of closure using the : syntax:

12

letx=proc(){...};// becomes:letx=move|:|{...};

Note also that it is precisely when there is no context that you must
also specify the types of any parameters. Hence something like:

Transitioning away from proc for library authors

The transition story for a library author is somewhat more
complicated. The complication is that the equivalent of a type like
proc():Send ought to be Box<FnOnce() + Send> – that is, a boxed
FnOnce object that is also sendable. However, we don’t currently
have support for invoking fn(self) methods through an object, which
means that if you have a Box<FnOnce()> object, you can’t call it’s
call_once method (put another way, the FnOnce trait is not object
safe). We plan to fix this – possibly by 1.0, but possibly shortly
thereafter – but in the interim, there are workarounds you can use.

In the standard library, we use a trait called Invoke (and, for
convenience, a type called Thunk). You’ll note that although these
two types are publicly available (under std::thunk), these types do
not appear in the public interface any other stable APIs. That is,
Thunk and Invoke are essentially implementation details that end
users do not have to know about. We recommend you follow the same
practice. This is for two reasons:

It generally makes for a better API. People would rather write
Thread::spawn(move|| ...) and not
Thread::spawn(Thunk::new(move|| ...)) (etc).

Eventually, once Box<FnOnce()> works properly, Thunk and
Invoke may be come deprecated. If this were to happen, your
public API would be unaffected.

Basically, the idea is to follow the “thin wrapper” pattern that I
showed earlier for hiding virtual dispatch. If you recall, I gave the
example of a function foo that wished to use virtual dispatch
internally but to hide that fact from its clients. It did do by creating
a thin wrapper API that just called into another API, performing the
object coercion:

The idea with Invoke is similar. The public APIs are generic APIs
that accept any FnOnce value. These just turnaround and wrap that
value up into an object. Here the problem is that while we would
probably prefer to use a Box<FnOnce()> object, we can’t because
FnOnce is not (currently) object-safe. Therefore, we use the trait
Invoke (I’ll show you how Invoke is defined shortly, just let me
finish this example):

The choice between static and virtual dispatch can be changed without
affecting users and without requiring wrapper functions.

I expect the improvements in inference before 1.0. Fixing the final
two points is harder and so we will have to see where it falls on the
schedule, but if it cannot be done for 1.0 then I would expect to see
those changes shortly thereafter.

]]>2014-11-14T21:13:52-05:00http://smallcultfollowing.com/babysteps/blog/2014/11/14/allocators-in-rustThere has been a lot of discussion lately about Rust’s allocator
story, and in particular our relationship to jemalloc. I’ve been
trying to catch up, and I wanted to try and summarize my understanding
and explain for others what is going on. I am trying to be as
factually precise in this post as possible. If you see a factual
error, please do not hesitate to let me know.

The core tradeoff

The story begins, like all interesting design questions, with a
trade-off. The problem with trade-offs is that neither side is 100%
right. In this case, the trade-off has to do with two partial truths:

It is better to have one global allocator than two. Allocators like
jemalloc, dlmalloc, and so forth are all designed to be the
only allocator in the system. Of necessity they permit a certain
amount of “slop”, allocating more memory than they need so that
they can respond to requests faster, or amortizing the cost of
metadata over many allocations. If you use two different
allocators, you are paying those costs twice. Moreover, the
allocator tends to be a hot path, and you wind up with two copies
of it, which leaves less room in the instruction cache for your
actual code.

Some allocators are more efficient than others. In particular,
the default allocators shipped with libc on most systems tend not
to be very good, though there are exceptions. One particularly good
allocator is jemalloc. In comparison to the default glibc or
windows allocator, jemalloc can be noticeably more efficient both
in performance and memory use. Moreover, jemalloc offers an
extended interface that Rust can take advantage of to gain even
more efficiency (for example, by specifying the sizes of a memory
block when it is freed, or by asking to reallocate memory in place
when possible).

Clearly, the best thing is to use just one allocator that is also
efficient. So, to be concrete, whenever we produce a Rust executable,
everyone would prefer if that Rust executable – along with any C code
that it uses – would just use jemalloc everywhere (or whatever
allocator we decide is ‘efficient’ tomorrow).

However, in some cases we can’t control what allocator other code will
use. For example, if a Rust library is linked into a larger C
program. In this case, we can opt to continue using jemalloc from
within that Rust code, but the C program may simply use the normal
allocator. And then we wind up with two allocators in use. This is
where the trade-off comes into play. Is it better to have Rust use
jemalloc even when the C program within which Rust is embedded does
not? In that case, the Rust allocations are more efficient, but at the
cost of having more than one global allocator, with the associated
inefficiencies. I think this is the core question.

Two extreme designs

Depending on whether you want to prioritize using a single allocator
or using an efficient allocator, there are two extreme designs one
might advocate for the Rust standard library:

When Rust needs to allocate memory, just call malloc and friends.

Compile Rust code to invoke jemalloc directly. This is what we
currently do. There are many variations on how to do
this. Regardless of which approach you take, this has the downside
that when Rust code is linked into C code, there is the possibility
that the C code will use one allocator, and Rust code another.

It’s important to clarify that what we’re discussing here is really
the default behavior, to some extent. The Rust standard library
already isolates the definition of the global allocator into a
particular crate. End users can opt to change the definition of that
crate. However, it would require recompiling Rust itself to do so,
which is at least a mild pain.

Calling malloc

If we opted to default to just calling malloc, this does not mean
that end users are locked into the libc allocator or anything like
that. There are existing mechanisms for changing what allocator is
used at a global level (though I understand this is relatively hard on
Windows). Presumably when we produce an actual Rust executables, we
would default to using jemalloc.

Calling malloc has the advantage that if a Rust library is linked into
a C program, both of them will be using the same global allocator,
whatever it is (unless of course that C program itself doesn’t call
malloc).

However, one downside of this is that we are not able to take
advantage of the more advanced jemalloc APIs for sized deallocation
and reallocation. This has a measureable effect in micro-benchmarks.
I am not aware of any measurements on larger scale Rust applications,
but there are definitely scenarios where the advanced APIs are useful.

Another potential downside of this approach is that malloc is called
via indirection (because it is part of libc; I’m a bit hazy on the
details of this point, and would appreciate clarification). This
implies a somewhat higher overhead for calls to malloc/free than if we
fixed the allocator ahead of time. It’s worth noting that this is the
normal setup that all C programs use by default, so relative to a
typical C program, this setup carries no overhead.

(When compiling a statically linked executables, rustc could opt to
redirect malloc and friends to jemalloc at this point, which would
eliminate the indirection overhead but not take advantage of the
specialized jemalloc APIs. This would be a simplified variant of the
hybrid scheme I eventually describe below.)

Calling jemalloc directly

If we opt to hardcode Rust’s default allocator to be jemalloc, we gain
several advantages. The performance of Rust code, at least, is not
subject to the whims of whatever global allocator the platform or
end-user provides. We are able to take full advantage of the
specialized jemalloc APIs. Finally, as the allocator is fixed to
jemalloc ahead of time, static linking scenarios do not carry the
additional overhead that calling malloc implies (though, as I noted,
one can remove that overhead also when using malloc via a simple
hybrid scheme).

Having Rust code unilatelly call jemalloc also carries downsides. For
example, if Rust code is embedded as a library, it will not adopt the
global allocator of the code that it is embedded within. This carries
the performance downsides of multiple allocators but also a certain
amount of risk, because a pointer allocated on one side cannot be
freed on the other (some argue this is bad practice; this is certainly
true if you do not know that the two sides are using the same
allocator, but is otherwise legitimate, see the section below for more
details).

The same problem can also occur in reverse, when C code is used from
within Rust. This happens today with rustc: due to the specifics of
our setup, LLVM uses the system allocator, not the jemalloc allocator
that Rust is using. This causes extra fragmentation and memory
consumption. It’s also not great because jemalloc is better than the
system allocator in many cases.

To prefix or not to prefix

One specific aspect of calling jemalloc directly concerns how it is
built. Today, we build jemalloc using name prefixes, effectively
“namespacing” it so that it does not interfere with the system
allocator. This is what causes LLVM to use a different allocator in
rustc. This has the advantage of clarity and side-stepping certain
footguns around dynamic linking that could otherwise occur, but at the
cost of forking the allocators.

A recent PR aimed to remove the prefix. It was rejected
because in a dynamic linking scenario, this creates a fragile
situation. Basically, the dynamic library (“client”) defines malloc
to be jemalloc. The host process also has a definition for malloc
(the system allocator). The precise result will depend on the flags
and platform that you’re running on, but there are basically two
possible outcomes, and both can cause perfectly legitimate code to
crash:

The host process wins, malloc means the same thing
everywhere (this occurs on linux by default).

malloc means different things in the host and the client
(this occurs on mac by default, and on linux with the
DEEPBIND flag).

In the first case, crashes can arise if the client code should try to
intermingle usage of the nonstandard jemalloc API (which maps to
jemalloc) with the standard malloc API (which the client believes to
also be jemalloc, but which has been remapped to the system allocator
by the host). The jemalloc documentation isn’t 100% explicit
on the matter, but I believe it is legal for code to (e.g.) call
mallocx and then call free on the result. Hence if Rust should
link some C code that did that, it would crash under the first
scenario.

In the second case, crashes can arise if the host/client attempt to
transfer ownership of memory. Some claim that this is not a legitimate
thing to do, but that is untrue: it is (usually) perfectly legal for
client code to (e.g.) call strdup and then pass the result back to
the host, expecting the host to free it. (Granted, it is best to be
cautious when transfering ownership across boundaries like this, and
one should never call free on a pointer unless you can be sure of
the allocator that was used to allocate that pointer in the first
place. But if you are sure, then it should be possible.)

UPDATE: I’ve been told that on Windows, freeing across DLL
boundaries is something you can never do. On Reddit,
Mr_Alert writes: “In Windows, allocating memory in one DLL and
freeing it in another is very much illegitimate. Different compiler
versions have different C runtimes and therefore different
allocators. Even with the same compiler version, if the EXE or DLLs
have the C runtime statically linked, they’ll have different copies of
the allocator. So, it would probably be best to link rust_alloc to
jemalloc unconditionally on Windows.” Given the number of differences
between platforms, it seems likely that the best behavior will
ultimately be platform dependent.

Fundamentally, the problems here are due to the fact that the client
is attempting to redefine the allocator on behalf of the host. Forcing
this kind of name conflict to occur intentionally seems like a bad
idea if we can avoid it.

A hybrid scheme

There is also the possibility of various hybrid schemes. One such
option that Alex Crichton and I put together, summarized in
this gist, would be to have Rust call neither the standard
malloc nor the jemalloc symbols, but rather an intermediate set of
APIs (let’s call them rust_alloc). When compiling Rust libraries
(“rlibs”), these APIs would be unresolved. These rust allocator APIs
would take all the information they need to take full advantage of
extended jemalloc APIs, if they are available, but could also be
“polyfilled” using the standard system malloc interface.

So long as Rust libraries are being compiled into “rlibs”, these
rust_alloc dependencies would remain unresolved. An rlib is
basically a statically linked library that can be linked into another
Rust program. At some point, however, a final artifact is produced, at
which point the rust_alloc dependency must be fulfilled. The way we
fulfill this dependency will ultimately depend on what kind of
artifact is produced:

Static library for use in a C program: link rust_alloc to malloc

Dynamic library (for use in C or Rust): link rust_alloc to malloc

Executable: resolve rust_alloc to jemalloc, and override the
system malloc with jemalloc as well.

This seems to offer the best of both worlds. Standalone, statically
linked Rust executables (the recommended, default route) get the full
benefit of jemalloc. Code that is linked into C or dynamically loaded
uses the standard allocator by default. Any C code used from within
Rust executables will also call into jemalloc as well.

However, there is one major caveat. While it seems that this scheme
would work well on linux, the behavior on other platforms is
different, and it’s not yet clear if the same scheme can be made to
work as well on Mac and Windows.

Naturally, even if we sort out the cross-platform challenges, this
hybrid approach too is not without its downsides. It means that Rust
code built for libraries will not take full advantage of what jemalloc
has to offer, and in the case of dynamic libraries there may be more
overhead per malloc invocation than if jemalloc were statically
linked. However, by the same token, Rust libraries will avoid the
overhead of using two allocators and they will also be acting more
like normal C code. And of course the embedding program may opt, in
its linking phase, to redirect malloc (globally) to jemalloc.

So what should we do?

The decision about what to do has a couple of facets. In the immediate
term, however, we need to take steps to improve rustc’s memory
usage. It seems to me that, at minimum, we ought to accept
strcat’s PR #18915, which ensures that Rust executables can
use jemalloc for everything, at least on linux. Everyone agrees that
this is a desirable goal.

Longer term, it is somewhat less clear. The reason that this decision
is difficult is that there is no choice that is “correct” for all
cases. The most performant choice will depend on the specifics of the
case:

Is the Rust code embedded?

How much allocation takes place in Rust vs in the other language?

What allocator is the other language using?

(As an example, the performance and memory use of rustc improved
when we adopted jemalloc, even partially, but other applications will
fare differently.)

At this point I favor the general principle that Rust code, when
compiled as a library for use within C code, should more-or-less
behave like C code would behave. This seems to suggest that, when
building libraries for C consumption, Rust should just call malloc,
and people can use the normal mechanisms to inject jemalloc if they so
choose. However, when compiling Rust executables, it seems
advantageous for us to default to a better allocator and to get the
maximum efficiency we can from that allocator. The hybrid scheme aims
to achieve both of these goals but there may be a better way to go
about it, particularly around the area of dynamic linking.

I’d like to see more measurement regarding the performance impact of
foregoing the specialized jemalloc APIs and using weak linking. I’ve
seen plenty of numbers suggesting jemalloc is better than other
allocators on the whole, and plenty of numbers saying that using
specialized APIs helps in microbenchmarks. But it is unclear what the
impact of such APIs (or weak linking) is on the performance of larger
applications.

I’d also like to get the input from more people who have experience in
this area. I’ve talked things over with strcat a fair
amount, who generally favors using jemalloc even if it means two
allocators. We’ve also reached out to Jason Evans, the author of
jemalloc, who stressed the fact that multiple global allocators is
generally a poor choice. I’ve tried to reflect their points in this
post.

Note though that whatever we decide we can evolve it as we go. There
is time to experiment and measure. One thing that is clear to me is
that we do not want Rust to “depend on” jemalloc in any hard
sense. That is, it should always be possible to switch from jemalloc
to another allocator. This is both because jemalloc, good as it is,
can’t meet everyone’s needs all the time, and because it’s just not a
necessary dependency for Rust to take. Establishing an abstraction
boundary around the “Rust global allocator” seems clearly like the
right thing to do, however we choose to fulfill it.

]]>2014-09-30T09:45:47-04:00http://smallcultfollowing.com/babysteps/blog/2014/09/30/multi-and-conditional-dispatch-in-traitsI’ve been working on a branch that implements both multidispatch
(selecting the impl for a trait based on more than one input type) and
conditional dispatch (selecting the impl for a trait based on where
clauses). I wound up taking a direction that is slightly different
from what is described in the trait reform RFC, and I
wanted to take a chance to explain what I did and why. The main
difference is that in the branch we move away from the crate
concatenability property in exchange for better inference and less
complexity.

The various kinds of dispatch

The first thing to explain is what the difference is between these
various kinds of dispatch.

Single dispatch. Let’s imagine that we have a conversion trait:

123

traitConvert<Target>{fnconvert(&self)->Target;}

This trait just has one method. It’s about as simple as it gets. It
converts from the (implicit) Self type to the Target type. If we
wanted to permit conversion between int and uint, we might
implement Convert like so:

Now, in the background here, Rust has this check we call
coherence. The idea is (at least as implemented in the master
branch at the moment) to guarantee that, for any given Self type,
there is at most one impl that applies. In the case of these two
impls, that’s satisfied. The first impl has a Self of int, and the
second has a Self of uint. So whether we have a Self of int or
uint, there is at most one impl we can use (and if we don’t have a
Self of int or uint, there are zero impls, that’s fine too).

Multidispatch. Now imagine we wanted to go further and allow int
to be converted to some other type MyInt. We might try writing an
impl like this:

12

structMyInt{i:int}implConvert<MyInt>forint{...}// int -> MyInt

Unfortunately, now we have a problem. If Self is int, we now have
two applicable conversions: one to uint and one to MyInt. In a
purely single dispatch world, this is a coherence violation.

The idea of multidispatch is to say that it’s ok to have multiple
impls with the same Self type as long as at least one of their
other type parameters are different. So this second impl is ok,
because the Target type parameter is MyInt and not uint.

Conditional dispatch. So far we have dealt only in concrete types
like int and MyInt. But sometimes we want to have impls that apply
to a category of types. For example, we might want to have a
conversion from any type T into a uint, as long as that type
supports a MyGet trait:

We call impls like this, which apply to a broad group of types,
blanket impls. So how do blanket impls interact with the coherence
rules? In particular, does the conversion from T to MyInt conflict
with the impl we saw before that converted from int to MyInt? In
my branch, the answer is “only if int implements the MyGet trait”.
This seems obvious but turns out to have a surprising amount of
subtlety to it.

Crate concatenability and inference

In the trait reform RFC, I mentioned a desire to support crate
concatenability, which basically means that you could take two crates
(Rust compilation units), concatenate them into one crate, and
everything would keep building. It turns out that the coherence rules
already basically guarantee this without any further thought –
except when it comes to inference. That’s where things get
interesting.

To see what I mean, let’s look at a small example. Here we’ll use the
same Convert trait as we saw before, but with just the original set
of impls that convert between int and uint. Now imagine that I
have some code which starts with a int and tries to call convert()
on it:

What can we say about the type of y here? Clearly the user did not
specify it and hence the compiler must infer it. If we look at the set
of impls, you might think that we can infer that y is of type
uint, since the only thing you can convert a int into is a uint.
And that is true – at least as far as this particular crate goes.

However, if we consider beyond a single crate, then it is possible
that some other crate comes along and adds more impls. For example,
perhaps another crate adds the conversion to the MyInt type that we
saw before:

12

structMyInt{i:int}implConvert<MyInt>forint{...}// int -> MyInt

Now, if we were to concatenate those two crates together, then this
type inference step wouldn’t work anymore, because int can now be
converted to eitheruint or MyInt. This means that the snippet
of code we saw before would probably require a type annotation to clarify
what the user wanted:

12

letx:int=...;lety:uint=x.convert();

Crate concatenation and conditional impls

I just showed that the crate concatenability principle interferes with
inference in the case of multidispatch, but that is not necessarily
bad. It may not seem so harmful to clarify both the type you are
converting from and the type you are converting to, even if there is
only one type you could legally choose. Also, multidispatch is fairly
rare; most traits has a single type that decides on the impl and
then all other types are uniquely determined. Moreover, with the
associated types RFC, there is even a syntactic way to
express this.

However, when you start trying to implement conditional dispatch
that is, dispatch predicated on where clauses, crate concatenability
becomes a real problem. To see why, let’s look at a different trait
called Push. The purpose of the Push trait is to describe
collection types that can be appended to. It has one associated type
Elem that describes the element types of the collection:

12345

traitPush{typeElem;fnpush(&mutself,elem:Elem);}

We might implement Push for a vector like so:

12345

impl<T>PushforVec<T>{typeElem=T;fnpush(&mutself,elem:T){...}}

(This is not how the actual standard library works, since push is an
inherent method, but the principles are all the same and I didn’t want
to go into inherent methods at the moment.) OK, now imagine I have
some code that is trying to construct a vector of char:

1234

letmutv=Vec::new();v.push('a');v.push('b');v.push('c');

The question is, can the compiler resolve the calls to push() here?
That is, can it figure out which impl is being invoked? (At least in
the current system, we must be able to resolve a method call to a
specific impl or type bound at the point of the call – this is a
consequence of having type-based dispatch.) Somewhat surprisingly, if
we’re strict about crate concatenability, the answer is no.

The reason has to do with DST. The impl for Push that we saw before
in fact has an implicit where clause:

123

impl<T>PushforVec<T>whereT:Sized{...}

This implies that some other crate could come along and implement Push for
an unsized type:

1

impl<T>PushforVec<[T]>{...}

Now, when we consider a call like v.push('a'), the compiler must
pick the impl based solely on the type of the receiver v. At the
point of calling push, all we know is that is the type of v is a
vector, but we don’t know what it’s a vector of – to infer the
element type, we must first resolve the very call to push that we
are looking at right now.

Clearly, not being able to call push without specifying the type of
elements in the vector is very limiting. There are a couple of ways to
resolve this problem. I’m not going to go into detail on these solutions,
because they are not what I ultimately opted to do. But briefly:

We could introduce some new syntax for distinguishing conditional
dispatch vs other where clauses (basically the input/output
distinction that we use for type parameters vs associated types).
Perhaps a when clause, used to select the impl, versus a where
clause, used to indicate conditions that must hold once the impl is
selected, but which are not checked beforehand. Hard to understand
the difference? Yeah, I know, I know.

We could use an ad-hoc rule to distinguish the input/output clauses.
For example, all predicates applied to type parameters that are
directly used as an input type. Limiting, though, and non-obvious.

We could create a much more involved reasoning system (e.g., in this
case, Vec::new() in fact yields a vector whose types are known to
be sized, but we don’t take this into account when resolving the
call to push()). Very complicated, unclear how well it will work
and what the surprising edge cases will be.

Or… we could just abandon crate concatenability. But wait, you ask,
isn’t it important?

Limits of crate concatenability

So we’ve seen that crate concatenability conflicts with inference and
it also interacts negatively with conditional dispatch. I now want to
call into question just how valuable it is in the first place. Another
way to phrase crate concatenability is to say that it allows you to
always add new impls without disturbing existing code using that
trait. This is actually a fairly limited guarantee. It is still
possible for adding impls to break downstream code across two
different traits, for example. Consider the following example:

Here you have two traits with the same method name (draw). However,
the first trait is implemented only on Player and the other on
Polygon. So the two never actually come into conflict. In
particular, if I have a player player and I write player.draw(), it could
only be referring to the draw method of the Cowboy trait.

But what happens if I add another impl for Image?

1

implImageforPlayer{...}

Now suddenly a call to player.draw() is ambiguous, and we need to
use so-called “UFCS” notation to disambiguate (e.g.,
Player::draw(&player)).

(Incidentally, this ability to have type-based dispatch is a great
strength of the Rust design, in my opinion. It’s useful to be able to
define method names that overlap and where the meaning is determined
by the type of the receiver.)

Conclusion: drop crate concatenability

So I’ve been turning these problems over for a while. After some
discussions with others, aturon in particular, I feel the best fix is
to abandon crate concatenability. This means that the algorithm for
picking an impl can be summarized as:

Search the impls in scope and determine those whose types can be
unified with the current types in question and hence could possibly
apply.

If there is more than one impl in that set, start evaluating where clauses to
narrow it down.

This is different from the current master in two ways. First of all,
to decide whether an impl is applicable, we use simple unification
rather than a one-way match. Basically this means that we allow impl
matching to affect inference, so if there is at most one impl that can
match the types, it’s ok for the compiler to take that into account.
This covers the let y = x.convert() case. Second, we don’t consider
the where clauses unless they are needed to remove ambiguity.

I feel pretty good about this design. It is somewhat less pure, in
that it blends the role of inputs and outputs in the impl selection
process, but it seems very usable. Basically it is guided only by
the ambiguities that really exist, not those that could theoretically
exist in the future, when selecting types. This avoids forcing the
user to classify everything, and in particular avoids the
classification of where clauses according to when they are evaluated
in the impl selection process. Moreover I don’t believe it introduces
any significant compatbility hazards that were not already present in
some form or another.

]]>2014-09-11T07:33:00-04:00http://smallcultfollowing.com/babysteps/blog/2014/09/11/attribute-and-macro-syntaxA few weeks back pcwalton introduced a PR that aimed to move the
attribute and macro syntax to use a leading @ sigil. This means that
one would write macros like:

@format("SomeString: {}", 22)

or

@vec[1, 2, 3]

One would write attributes in the same way:

@deriving(Eq)
struct SomeStruct {
}
@inline
fn foo() { ... }

This proposal was controversial. This debate has been sitting for a
week or so. I spent some time last week reading every single comment
and I wanted to lay out my current thoughts.

Why change it?

There were basically two motivations for introducing the change.

Free the bang. The first was to “free up” the ! sign. The
initial motivation was aturon’s error-handling RFC, but I think that
even if we decide not to act on that specific proposal, it’s still
worth trying to reserve ! and ? for something related to
error-handling. We are very limited in the set of characters we can
realistically use for syntactic sugar, and ! and ? are valuable
“ASCII real-estate”.

Part of the reason for this is that ! has a long history of being
the sigil one uses to indicate something dangerous or
surprising. Basically, something you should pay extra attention
to. This is partly why we chose it for macros, but in truth macros are
not dangerous. They can be mildly surprising, in that they don’t
necessarily act like regular syntax, but having a distinguished macro
invocation syntax already serves the job of alerting you to that
possibility. Once you know what a macro does, it ought to just fade
into the background.

Decorators and macros. Another strong motivation for me is that I
think attributes and macros are two sides of the same coin and thus
should use similar syntax. Perhaps the most popular attribute –
deriving – is literally nothing more than a macro. The only
difference is that its “input” is the type definition to which it is
attached (there are some differences in the implementation side
presently – e.g., deriving is based off the AST – but as I discuss
below I’d like to erase that distiction eventually). That said, right
now attributes and macros live in rather distinct worlds, so I think a
lot of people view this claim with skepticism. So allow me to expand
on what I mean.

How attributes and macros ought to move closer together

Right now attributes and macros are quite distinct, but looking
forward I see them moving much closer together over time. Here are
some of the various ways.

Attributes taking token trees. Right now attribute syntax is kind
of specialized. Eventually I think we’ll want to generalize it so that
attributes can take arbitrary token trees as arguments, much like
macros operate on token trees (if you’re not familiar with token
trees, see the appendix). Using token trees would allow more complex
arguments to deriving and other decorators. For example, it’d be great
to be able to say:

@deriving(Encodable(EncoderTypeName<foo>))

where EncoderTypeName<foo> is the name of the specific encoder that
you wish to derive an impl for, vs today, where deriving always
creates an encodabe impl that works for all encoders. (See
Issue #3740 for more details.) Token trees seem like the
obvious syntax to permit here.

Macros in decorator position. Eventually, I’d like it to be possible
for any macro to be attached to an item definition as a decorator. The
basic idea is that @foo(abc) struct Bar { ... } would be syntactic
sugar for (something like) @foo((abc), (struct Bar { ... }))
(presuming foo is a macro).

An aside: it occurs to me that to make this possible before 1.0 as I
envisioned it, we’ll need to at least reserve macro names so they
cannot be used as attributes. It might also be better to have macros
declare whether or not they want to be usable as decorators, just so
we can give better error messages. This has some bearing on the
“disadvantages” of the @ syntax discussed below, as well.

Using macros in decorator position would be useful for those cases
where the macro is conceptually “modifying” a base fn
definition. There are numerous examples: memoization, some kind of
generator expansion, more complex variations on deriving or
pretty-printing, and so on. A specific example from the past was the
externfn! wrapper that would both declare an extern "C" function
and some sort of Rust wrapper (I don’t recall precisely why). It was
used roughly like so:

externfn! {
fn foo(...) { ... }
}

Clearly, this would be nicer if one wrote it as:

@extern
fn foo(...) { ... }

Token trees as the interface to rule them all. Although the idea
of permitting macros to appear in attribute position seems to largely
erase the distinction between today’s “decorators”, “syntax
extensions”, and “macros”, there remains the niggly detail of the
implementation. Let’s just look at deriving as an example: today,
deriving is a transform from one AST node to some number of AST
nodes. Basically it takes the AST node for a type definition and emits
that same node back along with various nodes for auto-generated impls.
This is completely different from a macro-rules macro, which operates
only on token trees. The plan has always been to remove deriving out
of the compiler proper and make it “just another” syntax extension
that happens to be defined in the standard library (the same applies
to other standard macros like format and so on).

In order to move deriving out of the compiler, though, the interface
will have to change from ASTs to token trees. There are two reasons
for this. The first is that we are simply not prepared to standardize
the Rust compiler’s AST in any public way (and have no near term plans
to do so). The second is that ASTs are insufficiently general. We
have syntax extensions to accept all kinds of inputs, not just Rust
ASTs.

Note that syntax extensions, like deriving, that wish to accept Rust
ASTs can easily use a Rust parser to parse the token tree they are
given as input. This could be a cleaned up version of the libsyntax
library that rustc itself uses, or a third-party parser module
(think Esprima for JS). Using separate libraries is advantageous for
many reasons. For one thing, it allows other styles of parser
libraries to be created (including, for example, versions that support
an extensible grammar). It also allows syntax extensions to pin to an
older version of the library if necessary, allowing for more
independent evolution of all the components involved.

What are the objections?

There is an inherent ambiguity since @id() can serve as both an
attribute and a macro.

The first point seems to be a matter of taste. I don’t find @
particularly heavyweight, and I think that choosing a suitable color
for the emacs/vim modes will probably help quite a bit in making it
unobtrusive. In constrast, I think that ! has a strong connotation
of “dangerous” which seems inappropriate for most macros. But neither
syntax seems particularly egregious: I think we’ll quickly get used to
either one.

The second point regarding potential ambiguities is more
interesting. The ambiguities are easy to resolve from a technical
perpsective, but that does not mean that they won’t be confusing to
users.

Parenthesized macro invocations

The first ambiguity is that @foo() can be interpreted as either an
attribute or a macro invocation. The observation is that @foo() as a
macro invocation should behave like existing syntax, which means that
either it should behave like a method call (in a fn body) or a tuple
struct (at the top-level). In both cases, it would have to be followed
by a “terminator” token: either a ; or a closing delimeter (),
], and }). Therefore, we can simply peek at the next token to
decide how to interpret @foo() when we see it.

I believe that, using this disambiguation rule, almost all existing
code would continue to parse correctly if it were mass-converted to
use @foo in place of the older syntax. The one exception is
top-level macro invocations. Today it is common to write something
like:

declaremethods!(foo, bar)
struct SomeUnrelatedStruct { ... }

where declaremethods! expands out to a set of method declarations or
something similar.

If you just transliterate this to @, then the macro would be parsed
as a decorator:

Note that both of these are more consistent with our syntax in
general: tuple structs, for example, are always followed by a ; to
terminate them. (If you replace @declaremethods(foo, bar) with
struct Struct1(foo, bar), then you can see what I mean.) However,
today if you fail to include the semicolon, you get a parser error,
whereas here you might get a surprising misapplication of the macro.

Macro invocations with braces, square or curly

Until recently, attributes could only be applied to items. However,
recent RFCs have proposed extending attributes so that they can be
applied to blocks and expressions. These RFCs introduce additional
ambiguities for macro invocations based on [] and {}:

@foo{...} could be a macro invocation or an annotation @foo
applied to the block {...},

@foo[...] could be a macro invocation or an annotation @foo
applied to the expression [...].

These ambiguities can be resolved by requiring inner attributes for
blocks and expressions. Hence, rather than @cold x + y, one would
write (@!cold x) + y. I actually prefer this in general, because it
makes the precedence clear.

OK, so what are the options?

Using @ for attributes is popular. It is the use with macros that is
controversial. Therefore, how I see it, there are three things on the
table:

Use @foo for attributes, keep foo! for macros (status quo-ish).

Use @foo for both attributes and macros (the proposal).

Use @[foo] for attributes and @foo for macros (a compromise).

Option 1 is roughly the status quo, but moving from #[foo] to @foo
for attributes (this seemed to be universally popular). The obvious
downside is that we lose ! forever and we also miss an opportunity
to unify attribute and macro syntax. We can still adopt the model
where decorators and macros are interoperable, but it will be a little
more strange, since they look very different.

The advantages of Option 2 are what I’ve been talking about this whole
time. The most significant disadvantage is that adding a semicolon can
change the interpretation of @foo() in a surprising way,
particularly at the top-level.

Option 3 offers most of the advantages of Option 2, while retaining a
clear syntactic distinction between attributes and macro usage. The
main downside is that @deriving(Eq) and @inline follow the
precedent of other languages more closely and arguably look cleaner
than @[deriving(Eq)] and @[inline].

What to do?

Currently I personally lean towards options 2 or 3. I am not happy
with Option 1 both because I think we should reserve ! and because I
think we should move attributes and macros closer together, both in
syntax and in deeper semantics.

Choosing between options 2 and 3 is difficult. It seems to boil down
to whether you feel the potential ambiguities of @foo() outweigh the
attractiveness of @inline vs @[inline]. I don’t personally have a
strong feeling on this particular question. It’s hard to say how
confusing the ambiguities will be in practice. I would be happier if
placing or failing to place a semicolon at the right spot yielded a
hard error.

So I guess I would summarize my current feeling as being happy with
either Option 2, but with the proviso that it is an error to use a
macro in decorator position unless it explicitly opts in, or Option 3,
without that proviso. This seems to retain all the upsides and avoid
the confusing ambiguities.

Appendix: A brief explanation of token trees

Token trees are the basis for our macro-rules macros. They are a
variation on token streams in which tokens are basically uninterpreted
except that matching delimeters ((), [], {}) are paired up. A
macro-rules macro is then “just” a translation from a token tree to
another token. This output token tree is then parsed as
normal. Similarly, our parser is actually not defined over a stream
of tokens but rather a token tree.

Our current implementation deviates from this ideal model in some
respects. For one thing, macros take as input token trees with
embedded asts, and the parser parses a stream of tokens with embedded
token trees, rather than token trees themselves, but these details are
not particularly relevant to this post. I also suspect we ought to
move the implementation closer to the ideal model over time, but
that’s the subject of another post.

]]>2014-07-09T10:08:00-04:00http://smallcultfollowing.com/babysteps/blog/2014/07/09/an-experimental-new-type-inference-scheme-for-rustWhile on vacation, I’ve been working on an alternate type inference
scheme for rustc. (Actually, I got it 99% working on the plane, and
have been slowly poking at it ever since.) This scheme simplifies the
code of the type inferencer dramatically and (I think) helps to meet
our intutions (as I will explain). It is however somewhat less
flexible than the existing inference scheme, though all of rustc and
all the libraries compile without any changes. The scheme will (I
believe) make it much simpler to implement to proper one-way matching
for traits (explained later).

Note: Changing the type inference scheme doesn’t really mean much to
end users. Roughly the same set of Rust code still compiles. So this
post is really mostly of interest to rustc implementors.

The new scheme in a nutshell

The new scheme is fairly simple. It is based on the observation that
most subtyping in Rust arises from lifetimes (though the scheme is
extensible to other possible kinds of subtyping, e.g. virtual
structs). It abandons unification and the H-M infrastructure and takes
a different approach: when a type variable V is first related to
some type T, we don’t set the value of V to T directly. Instead,
we say that V is equal to some type U where U is derived by
replacing all lifetimes in T with lifetime variables. We then relate
T and U appropriately.

Let me give an example. Here are two variables whose type must be
inferred:

'a: { // 'a --> name of block's lifetime
let x = 3;
let y = &x;
...
}

Let’s say that the type of x is $X and the type of y is $Y,
where $X and $Y are both inference variables. In that case, the
first assignment generates the constraint that int <: $X and the
second generates the constraint that &'a $X <: $Y. To resolve the
first constraint, we would set $X directly to int. This is because
there are no lifetimes in the type int. To resolve the second
constraint, we would set $Y to &'0 int – here '0 represents a
fresh lifetime variable. We would then say that &'a int <: &'0 int,
which in turn implies that '0 <= 'a. After lifetime inference is
complete, the types of x and y would be int and &'a int as
expected.

Without unification, you might wonder what happens when two type
variables are related that have not yet been associated with any
concrete type. This is actually somewhat challenging to engineer, but
it certainly does happen. For example, there might be some code like:

Here, at the point where we process x = y.unwrap(), we do not yet
know the values of either $X or $0. We can say that the type of
y.unwrap() will be $0 but we must now process the constrint that
$0 <: $X. We do this by simply keeping a list of outstanding
constraints. So neither $0 nor $X would (yet) be assigned a
specific type, but we’d remember that they were related. Then, later,
when either $0 or $Xis set to some specific type T, we can go
ahead and instantiate the other with U, where U is again derived
from T by replacing all lifetimes with lifetime variables. Then we
can relate T and U appropriately.

If we wanted to extend the scheme to handle more kinds of inference
beyond lifetimes, it can be done by adding new kinds of inference
variables. For example, if we wanted to support subtyping between
structs, we might add struct variables.

What advantages does this scheme have to offer?

The primary advantage of this scheme is that it is easier to think
about for us compiler engineers. Every type variable is either set
– in which case its type is known precisely – or unset – in which
case its type is not known at all. In the current scheme, we track a
lower- and upper-bound over time. This makes it hard to know just how
much is really known about a type. Certainly I know that when I think
about inference I still think of the state of a variable as a binary
thing, even though I know that really it’s something which evolves.

What prompted me to consider this redesign was the need to support
one-way matching as part of trait resolution. One-way matching is
basically a way of saying: is there any substitution S such that T
<: S(U) (whereas normal matching searches for a substitution applied
to both sides, like S(T) <: S(U)).

One-way matching is very complicated to support in the current
inference scheme: after all, if there are type variables that appear
in T or U which are partially constrained, we only know bounds
on their eventual type. In practice, these bounds actually tell us a
lot: for example, if a type variable has a lower bound of int, it
actually tells us that the type variable isint, since in Rust’s
type system there are no super- of sub-types of int. However,
encoding this sort of knowledge is rather complex – and ultimately
amounts to precisely the same thing as this new inference scheme.

Another advantage is that there are various places in the Rust’s type
checker whether we query the current state of a type variable and make
decisions as a result. For example, when processing *x, if the type
of x is a type variable T, we would want to know the current state
of T – is T known to be something inherent derefable (like &U
or &mut U) or a struct that must implement the Deref trait? The
current APIs for doing this bother me because they expose the bounds
of U – but those bounds can change over time. This seems “risky” to
me, since it’s only sound for us to examine those bounds if we either
(a) freeze the type of T or (b) are certain that we examine
properties of the bound that will not change. This problem does not
exist in the new inference scheme: anything that might change over
time is abstracted into a new inference variable of its own.

What are the disadvantages?

One form of subtyping that exists in Rust is not amenable to this
inference. It has to do with universal quantification and function
types. Function types that are “more polymorphic” can be subtypes of
functions that are “less polymorphic”. For example, if I have a
function type like <'a> fn(&'a T) -> &'a uint, this indicates a
function that takes a reference to T with any lifetime 'a and
returns a reference to a uint with that same lifetime. This is a
subtype of the function type fn(&'b T) -> &'b uint. While these
two function types look similar, they are quite different: the former
accepts a reference with any lifetime but the latter accepts only a
reference with the specific lifetime 'b.

What this means is that today if you have a variable that is assigned
many times from functions with varying amounts of polymorphism,
we will generally infer its type correctly:

However, this will not work in the newer scheme. Type ascription of
some form would be required. As you can imagine, this is not a very
.common problem, and it did not arise in any existing code.

(I believe that there are situations which the newer scheme infers
correct types and the older scheme will fail to compile; however, I
was unable to come up with a good example.)

How does it perform?

I haven’t done extensive measurements. The newer scheme creates a lot
of region variables. It seems to perform roughly the same as the older
scheme, perhaps a bit slower – optimizing region inference may be
able to help.

]]>2014-07-06T11:10:00-04:00http://smallcultfollowing.com/babysteps/blog/2014/07/06/implied-boundsI am on vacation for a few weeks. I wanted to take some time to jot
down an idea that’s been bouncing around in my head. I plan to submit
an RFC at some point on this topic, but not yet, so I thought I’d
start out by writing a blog post. Also, my poor blog has been
neglected for some time. Consider this a draft RFC. Some important
details about references are omitted and will come in a follow-up blog
post.

The high-level summary of the idea is that we will take advantage of
bounds declared in type declarations to avoid repetition in fn and
impl declarations.

Summary and motivation

Recent RFCs have introduced the ability to declare bounds within type
declarations. For example, a HashMap type might be defined as
follows:

struct HashMap<K:Hash,V> { ... }
trait Hash : Eq { ... }

These type declarations indicate that every hashmap is parameterized
by a key type K and a value type V. Furthermore, K must be a
hashable type. (The trait definition for Hash, meanwhile, indicates
that every hashable type must also be equatable.)

Currently, the intention with these bounds is that every time the user
writes HashMap<SomeKey,SomeValue>, the compiler will run off and
verify that, indeed, SomeKey implements the trait Hash. (Which in
turn implies that SomeKey implements Eq.)

This RFC introduces a slight twist to this idea. For the types of
function parameters as well as the self types of impls, we will not
verify their bounds immediately, but rather attach those bounds as
[where clauses][where] on the fn. This shifts the responsibility for
proving the bounds are satisfied onto the fn’s caller; in turn, it
allows the fn to assume that the bounds are satisfied. The net
result is that you don’t have to write as many duplicate bounds.

As applied to type parameter bounds

Let me give an example. Here is a generic function that inserts a key
into a hashmap if there is no existing entry for the key:

Today this function would not type-check because the type K has no
bounds. Instead one must declare K:Hash. But this bound feels rather
pointless – after all, the fact that the function takes a hashmap
as argument implies that K:Hash. With the proposed change,
however, the fn above is perfectly legal.

Because impl self types are treated the same way, it will also be less
repititious to define methods on a type. Whereas before one would
have to write:

impl<K:Hash,V> HashMap<K,V> {
...
}

it is now sufficient to leave off the Hash bound, since it will be
inferred from the self-type:

impl<K,V> HashMap<K,V> {
...
}

As applied to lifetimes

In fact, we already have a similar rule for
lifetimes. Specifically, in some cases, we will infer a relationship
between the lifetime parameters of a function. This is the reason that
the following function is legal:

Here, the lifetime of (**x).field (when all dereferences are written
in full) is most properly 'b, but we are returning a reference with
lifetime 'a. The compiler permits this because there exists a
parameter of type &'a &'b Foo – from this, the compiler infers that
'a <= 'b. The basis for this inference is a rule that you cannot
have a reference that outlives its referent. This is very helpful for
making some programs typecheck: this is particularly true with generic
traits, as described in this blog post.

Detailed design

Well-formed types and the BOUNDS function

We say that a type is well-formed if all of its bounds are met. We
define a function BOUNDS(T) that maps from a type T to the set of
bounds that must be satisfied for T to be called well-formed.

For the scalar types like int or float, BOUNDS just returns the
empty set:

BOUNDS(int) = {}
BOUNDS(uint) = {}
BOUNDS(...) = {}

For struct types like HashMap<SomeKey,SomeValue>, the function
combines the bounds declared on the HashMap type with those declared
on SomeKey and SomeValue. (The SUBST() function is used to
substitute the actual type parameters T1 ... Tn for their formal
counterparts.)

Well-formed references

Note that I have not defined the LOWER-BOUND function. The proper
definition of this function is important and I have been working on
it, but I prefer to defer that subject to a post/RFC of its own.
(Clarifying the lower-bound function, however, is the heart of #5723
along with a number of other recent bugs being filed on lifetimes.)
Note that this definition subsumes the existing rule for references
described in my prior blog post.

]]>2014-05-14T07:52:00-04:00http://smallcultfollowing.com/babysteps/blog/2014/05/14/follow-up-to-focusing-on-ownershipThis post withdrawn: it was posted by accident and was incomplete.

Philosophical motivation

The main reason I want to do this is because I believe it makes the
language more coherent and easier to understand. Basically, it
refocuses us from talking about mutability to talking about
aliasing (which I will call “sharing”, see below for more on that).

Mutability becomes a sideshow that is derived from uniqueness: “You
can always mutate anything that you have unique access to. Shared data
is generally immutable, but if you must, you can mutable it using some
kind of cell type.”

Put another way, it’s become clear to me over time that the problems
with data races and memory safety arise when you have both aliasing
and mutability. The functional approach to solving this problem is to
remove mutability. Rust’s approach would be to remove aliasing. This
gives us a story to tell and helps to set us apart.

A note on terminology: I think we should refer to aliasing as
sharing. In the past, we’ve avoided this because of its
multithreaded connotations. However, if/when we implement the
data parallelismplans I have proposed, then this
connotation is not at all inappropriate. In fact, given the
close relationship between memory safety and data races, I
actually want to promote this connotation.

Eductional motivation

I think that the current rules are harder to understand than they have
to be. It’s not obvious, for example, that &mut T implies no
aliasing. Moreover, the notation &mut T suggests that &T implies
no mutability, which is not entirely accurate, due to types like
Cell. And nobody can agree on what to call them (“mutable/immutable
reference” is the most common thing to say, but it’s not quite right).

In contrast, a type like &my T or &only T seems to make
explanations much easier. This is a unique reference – of course
you can’t make two of them pointing at the same place. And
mutability is an orthogonal thing: it comes from uniqueness, but
also cells. And the type &T is precisely its opposite, a shared
reference. RFC PR #58 makes a number of similar arguments. I
won’t repeat them here.

Practical motivation

Currently there is a disconnect between borrowed pointers, which can
be either shared or mutable+unique, and local variables, which are
always unique, but may be mutable or immutable. The end result of this
is that users have to place mut declarations on things that are not
directly mutated.

Locals can’t be modeled using references

This phenomena arises from the fact that references are just not as
expressive as local variables. In general, this hinders abstraction.
Let me give you a few examples to explain what I mean. Imagine I have
an environment struct that stores a pointer to an error counter:

But that is wrong. The problem is that &Env is an aliasable type,
and hence env.errors appears in an aliasable location. To make this
code work, I have to declare env as mutable and use an &mut
reference:

This problem arises because we know about locals being unique, but we
can’t put that knowledge into a borrowed reference without making it
mutable.

This problem arises in a number of other places. Until now, we’ve
papered over it in a variety of ways, but I continue to feel like
we’re papering over a disconnect that just shouldn’t be there.

Type-checking closures

We had to work around this limitation with closures. Closures are
mostly desugarable into structs like Env, but not quite. This is
because I didn’t want to require that &mut locals be declared mut
if they are used in closures. In other words, given some code like:

fn foo(errors: &mut int) {
do_something(|| *errors += 1)
}

The closure expression will in fact create an Env struct like:

struct ClosureEnv<'a, 'b> {
errors: &uniq &mut int
}

Note the &uniq reference. That’s not something an end-user can type.
It means a “unique but not necessarily mutable” pointer. It’s needed
to make this all type check. If the user tried to write that struct
manually, they’d have to write &mut &mut int, which would in turn
require that the errors parameter be declared mut errors: &mut
int.

Unboxed closures and procs

I foresee this limitation being an issue for unboxed closures. Let me
elaborate on the design I was thinking of. Basically, the idea would
be that a || expression is equivalent to some fresh struct type that
implements one of the Fn traits:

We’ll … probably want to bikeshed the syntax, maybe add sugar like
FnMut(int) -> int or retain |int| -> int, etc. That’s not so
important, what matters is that we’d be passing in the closure by
value. Note that with current DST rules it is legal to pass in a
trait type by value as an argument, so the FnMut<int,int> argument
is legal in DST and not an issue.

An aside: This design isn’t complete and I will describe the full
details in a separate post.

The problem is that calling the closure will require an &mut
reference. Since the closure is passed by value, users will again
have to write a mut where it doesn’t seem to belong:

This is the same problem as the Env example above: what’s really
happening here is that the FnMut trait just wants a unique
reference, but since that is not part of the type system, it requests
a mutable reference.

Now, we can probably work around this in various ways. One thing we
could do is to have the || syntax not expand to “some struct type”
but rather “a struct type or a pointer to a struct type, as dictated
by inference”. In that case, the callee could write:

I don’t mean to say this is the end of the world. But it’s one more in
a growing of contortions we have to go through to retain this split
between locals and references.

Other parts of the API

I haven’t done an exhaustive search, but naturally this distinction
creeps in elsewhere. For example, to read from a Socket, I need a
unique pointer, so I have to declare it mutable. Therefore, sometime
like this doesn’t work:

Naturally, in my proposal, code like this would work fine. You’d still
get an error if you tried to read from a &Socket, but then it would
say something like “can’t create a unique reference to a shared
reference”, which I personally find more clear.

But don’t we need mut for safety?

No, we don’t. Rust programs would be equally sound if you just
declared all bindings as mut. The compiler is perfectly capable of
tracking which locals are being mutated at any point in time –
precisely because they are local to the current function. What the
type system really cares about is uniqueness.

The value I see in the current mut rules, and I won’t deny there is
value, is primarily that they help to declare intent. That is, when
I’m reading the code, I know which variables may be reassigned. On the
other hand, I spend a lot of time reading C++ code too, and to be
honest I’ve never noticed this as a major stumbling block. (Same goes
for the time I’ve spent reading Java, JavaScript, Python, or Ruby
code.)

It is also true that I have occasionally found bugs because I declared
a variable as mut and failed to mutate it. I think we could get
similar benefits via other, more aggressive lints (e.g., none of the
variables used in the loop condition are mutated in the loop body). I
personally cannot recall having encountered the opposite situation:
that is, if the compiler says something must be mutable, that
basically always means I forgot a mut keyword somewhere. (Think:
when was the last time you responded to a compiler error about illegal
mutation by doing anything other than restructuring the code to make
the mutation legal?)

Alternatives

I see three alternatives to the current system:

The one I have given, where you just drop “mutability” and track
only uniqueness.

One where you have three reference types: &, &uniq, and
&mut. (As I wrote, this is in fact the type system we have today,
at least from the borrow checker’s point of view.)

A stricter variant in which “non-mut” variables are always
considered aliased. That would mean that you’d have to write:

You’d need to declare p as mut because otherwise it’d be
considered aliased, even though it’s a local, and hence mutating
*p would be illegal. What feels weird about this scheme is that
the local variable is not aliased, and we clearly know that,
since we will allow it to be moved, run destructors on it and so
forth. That is, we still have a notion of “owned” that is distinct
from “not aliased”.

On the other hand, if we described this system by saying that
mutability inherits through &mut pointers, and not by talking
about aliasing at all, it might make sense.

Of these three, I definitely prefer #1. It’s the simplest, and right
now I am most concerned with how we can simplify Rust while retaining
its character. Failing that, I think I prefer what we have right now.

Conclusions

Basically, I feel like the current rules around mutability have some
value, but they come at a cost. They are presenting a kind of leaky
abstraction: that is, they present a simple story that turns out to be
incomplete. This causes confusion for people as they transition from
the initial understanding, in which &mut is how mutability works,
into the full understanding: sometimes mut is needed just to get
uniqueness, and sometimes mutability comes without the mut keyword.

Moreover, we have to bend over backwards to maintain the fiction that
mut means mutable and not unique. We had to add special cases to
borrowck to check closures. We have to make the rules around &mut
mutability more complex in general. We have to either add mut to
closures so that we can call them, or make closure expressions have a
less obvious desugaring. And so forth.

Finally, we wind up with a more complicated language overall. Instead
of just having to think about aliasing and uniqueness, the user has to
think about both aliasing and mutability, and the two are somehow
tangled up together.

I don’t think it’s worth it.

]]>2014-04-24T19:33:00-04:00http://smallcultfollowing.com/babysteps/blog/2014/04/24/parallel-pipelines-for-jsI’ve been thinking about an alternative way to factor the PJS API.
Until now, we’ve had these methods like mapPar(), filterPar() and
so forth. They work mostly like their sequential namesakes but execute
in parallel. This API has the advantage of being easy to explain and
relatively clear, but it’s also not especially flexible nor elegant.

Lately, I’ve been prototyping an alternate design that I call
parallel pipelines (that’s just a working title; I expect the name
to change). Compared to the older approach, parallel pipelines are a
more expressive API that doesn’t clutter up the array prototypes. The
design draws on precedent from a lot of other languages, such as
Clojure, Ruby, and Scala, which all offer similar capabilities. I’ve
prototyped the API on a branch of SpiderMonkey, though the
code doesn’t yet run in parallel (it is structured in such a way as to
make parallel execution relatively straightforward, though).

CAVEAT: To be clear, this design is just one that’s in my head. I
still have to convince everyone else it’s a good idea. :) Oh, and one
other caveat: most all the names in here are just temporary, I’m sure
they’ll wind up changing. Along with probably everything else.

Pipelines in a nutshell

The API begins with a single method called parallel() attached to
Array.prototype and typed object arrays. When you invoke
parallel(), no actual computation occurs yet. Instead, the result is
a parallel pipeline that, when executed, will iterate over the
elements of the array.

You can then call methods like map and filter on this
pipeline. None of these transformers takes any immediate action;
instead they just return a new parallel pipeline that will, when
executed, perform the appropriate map or filter.

This yields a pipeline that, when executed, will multiply each element
of the array by 3 and then select the results that are even.

Once you’ve finished building up your pipeline, you execute it by
using one of two methods, toArray() or reduce(). toArray() will
execute the pipeline and return a new array with the results.
reduce() will exeute the pipeline but instead of returning an array
it reduces the elements returns a single scalar result.

Execution works the same way as PJS today: that is, we will attempt to
execute in parallel. If your code mutates global state, or uses other
features of JS that are not safe for parallel execution, then you will
wind up with a sequential fallback semantics.

Pipelines and typed objects

The pipeline API is integrated with typed objects. Each pipeline stage
generates values of a specific type; when you toArray() the result,
you get back a typed object array based around this type.

Producing typed object arrays doesn’t incur any limitations vs using a
normal JS array, because the element type can always just be any.
Moreover, in those cases where you are able to produce a more
specialized type, such as int32, you will get big savings in memory
usage since typed object arrays enable a very compact representation.

Ranges

In the previous example, I showed how to create a pipeline given an
array as the starting point. Sometimes you want to create parallel
operations that don’t have any array but simply iterate over a range
of integers. One obvious case is when you are producing a fresh array
from scratch.

To support this, we will add a new “parallel” module with a variety of
functions for producing pipelines from scratch. One such function is
range(min, max), which just produces a range of integers starting
with min and stepping up to max. So if we wanted to compute
the first N fibonnaci numbers in parallel, we could write:

var fibs = parallel.range(0, N).map(fibonacci).toArray();

In fact, using range(), we can implement the parallel() method
for normal JS arrays:

Shapes and n-dimensional pipelines

Arrays are great, but it frequently happens that we want to work with
multiple dimensions. For this reasons, parallel pipelines are not
limited to iterating over a single dimensional space. They are can
also iterate over multiple dimensions simultaneously.

We call the full iteration space a shape. Shapes are a list, where
the length of the shape corresponds to the number of dimensions. So a
1-dimensional iteration, such as that produced by range(), has a
shape like [N]. But a 2-d iteration might have the shape [W, H]
(where W and H might be the width and height of an image).
Similarly, iterating over some 3-D space would have a shape
[X, Y, Z].

To iterate over a parallel shape, you can use the parallel.shape()
function. For example, the following command iterates over a 5x5
space, and produces a two-dimensional typed object array of integers:

You can see that shape() produces a vector [x, y] specifying the
current coordinates of each element in the space. In this case, we map
that result and add x and y, which means that the end result will be:

0 1 2 3 4
1 2 3 4 5
2 3 4 5 6
3 4 5 6 7
4 5 6 7 8

Another way to get N-dimensional iteration is to start with an
N-dimensional typed object array. The parallel() method on typed
object arrays takes an optional depth argument specifying how many
of the outer dimensions you want to iterate over in parallel; this
argument defaults to 1. This means we could further transform our matrix
as shown here:

var matrix2 = matrix.parallel(2).map(i => i + 1).toArray();

The end result would be to add one to each cell in the matrix:

1 2 3 4 5
2 3 4 5 6
3 4 5 6 7
4 5 6 7 8
5 6 7 8 9

Deferred pipelines

All the pipelines I showed so far were essentially “single shot”. They
began with a fixed array and applied various operations to it and then
created a result. But sometimes you would like to specify a pipeline
and then apply it to multiple different arrays. To support this, you
can create a “detached” pipeline. For example, the following code
would create a pipeline for incrementing each element by one:

var pipeline = parallel.detached().map(i => i + 1);

Before you actually execute a detached pipeline, you must attach it to
a specific input. The result is a new, attached pipeline which can
then be converted to an array or reduced. Of course, you can attach
the pipeline many times:

Put it all together

Here is relatively complete, if informal, description of the pipeline
methods I’ve thought about thus far.

Creating pipelines

The fundamental ways to create a pipeline are the methods range(),
shape(), and detached(), all available from a parallel module.
In addition, Array.prototype and the prototype for typed object
arrays both feature a parallel() method that creates a pipeline
as we showed before.

Transforming pipelines

Each of the methods described here are available on all pipelines.
Each produces a new pipeline.

The most common methods for transforming a pipeline will probably be
map and mapTo:

pipeline.map(func) – invokes func on each element, preserving
the same output type as pipeline.

pipeline.mapTo(type, [func]) – invokes func on each element,
converting the result to have the type type. In fact, func is
optional if you just want to convert between types.

map and mapTo are somewhat special in that they work equally well
over any number of dimensions. The rest of the available methods
always operate only over the outermost dimension of the pipeline.
They also create a single dimensional output. Because this is a blog
post and not a spec, I won’t bother writing out the descriptions in
detail:

pipeline.flatMap(func) – like map, but flatten one layer of arrays

pipeline.filter(func) – drop elements for which func returns false

pipeline.scan(func) – prefix sum

pipeline.scatter(...) – move elements from one index to another

Finally, there is the attach() method, which is only applicable to
detached pipelines. It produces a new pipeline that is attached to a
specific input. If the pipeline is already attached, an exception results.

Executing pipelines

There are two fundamental ways to execute a pipeline:

pipeline.toArray() – executes the pipeline and collects
the result into a new typed object array. The dimensions and type
of this array are determined by the pipeline.

pipeline.reduce(func, [initial]) – executes the pipeline and reduces
the results using func, possibly with an initial value. Returns
the result of this reduction.

Open questions

Should pipelines provide the index to the callback? I decided to
strive for simplicity and just say that pipeline transformers like
map always pass a single value to their callback. I imagine we
could add an enumerate transformer if indices are desired. But then
again, this pollutes the value being produced with indices that just
have to be stripped away. So maybe it’s better to just pass the index
as a second argment that the user can use or ignore as they choose. I
imagine that the index will be an integer for a 1D pipeline, and an
array for a multidimensional pipeline.

Should pipelines be iterable? It has been suggested that pipelines
be iterable. I guess that this would be equivalent to collecting the
pipeline into an array and then iterating over that. (Since, unless
you are doing a map operation, the only real purpose for iteration is
to produce side-effects.) This would be say enough to add, I just was
worried that it might be misleading to see code like this:

for (var e of array.parallel().map(...)) {
/* Maybe it looks like this for loop body
executes in parallel? (Which it doesn't.) */
}

Updates

Renamed collect() to toArray(), which seems clearer.

]]>2014-04-01T20:33:00-04:00http://smallcultfollowing.com/babysteps/blog/2014/04/01/value-types-in-javascriptHere is the current state of my thinking with respect to value types
and value objects. Some of you may have seen
Brendan’s slides where he discusses value objects. This post
is about the same topic, but it is focused on just the initial part of
the work – what it means to be a value object and how we could define
value types and integrate them into the standard. I am not going to
discuss new syntax or operators yet. I have thoughts on those too but
I wanted to start by laying out the foundations.

The need for extensible value types

JavaScript has long had a division between primitive values and
objects. These two things are fundamentally rather different.
Primitive values have no identity and no prototype. They are not
allocated, just used. They are also immutable. Consider an integer
like var x = 1 – if I write x += 1, I haven’t incremented the
number 1 itself, I’ve just changed the variable x to have a new
value, 2.

Objects are rather different. When I create an object, it has
identity. If I execute the same expression twice, I get two
different objects. This is why {} === {} evaluates to false
(unlike, say, 1 === 1). In turn, objects have mutable contents, so I
write foo.x += 1 and mutate the contents of foo.

There is nothing wrong with the division between values and objects in
and of itself. Both have their place and are useful in certain
circumstances. What is unfortunate is that JavaScript makes the set
of value types inextensible. That is, the only value types I can have
are the primitives that the spec itself provides: booleans, numbers,
strings, and (in ES6) symbols. (I’ve probably forgotten one, but it
doesn’t matter.)

In this post, I’ll lay out a preliminary design for allowing users to
define and use their own value types. These types offer the same
advantages as the built-in types: they are immutable and have no
identity apart from their value. When used appropriately – i.e., in
places where a value is fundamentally what you want and not an
object – this makes programs easier to read and write and also
easier to optimize. Everybody wins!

Value types are just both tasty and nutritious

Suppose that I have a type representing colors:

123456

functionColor(r,g,b,a){this.r=r;this.g=g;this.b=b;this.a=a;}

Now I can create a color by doing new Color(22, 44, 66, 88). This
conceptually represents a color. I will argue here that colors are an
example of a type that really wants to be a value. The fact that JS
forces us to represent colors as mutable objects is really wrong and
makes code harder and less convenient to write. Later on, we’ll see
how we could define Color as a value type, which would not only make
it more convenient but help to make our generated code more efficient
as well.

Comparisons

Now what I want to tell if the color of two rectangles is the same?
You’d hope I could just write rect1.color === rect2.color, but of
course I cannot, at least not reliably. The problem is that colors are
objects and thus when we compare with === we are comparing
for object identity rather than representing the same color.

To compare if two colors represent the same color, we have to write
some kind of equals function:

Now I have to remember to write code like
rect1.color.equals(rect2.color). This is not as pretty and of course
if I forget somewhere I’ll just get the wrong behavior. Too bad.

Mutation and aliasing

Another problem with using objects for colors is that they are
mutable. For something like colors, this is probably not what we
want. In particular, I’d like to be able to write code like:

1

rect2.color=rect1.color;

The problem is that if I do this, I have now linked rect1 and
rect2 to the same color object. So now if some other piece of code
tries to modify the color of rect1:

1

rect1.color.r+=3;

This change will also affect the color of rect2! That is almost
certainly not what we wanted to happen. Yuck.

Hard to optimize

The presence of pointer identity, aliasing, and mutability also
inhibit a wide variety optimizations. For example, imagine I had a
loop like:

12345

for(...){...doSomething("foo"+"bar");...}

Any JIT engine could, if it choose, safely lift that expression
"foo"+"bar" out of the loop and evaluate it exactly once, rather
than evaluating it on every iteration through the loop. But if I write
some similar code that constructs a Color instance, it will be much
harder to optimize:

12345

for(...){...doSomething(newColor(255,0,0,0));...}

We’d like to optimize this to create just one Color instead of one
per loop iteration. But we have to be very careful if we do so. After
all, what if doSomething mutated the fields of the color, like so:

12345

functiondoSomething(c){...c.r+=1;...}

Now if we don’t create a new color on every iteration, we’ll just keep
modifying the same object. That’s no good.

Primitive types do not generalize to user-defined value types

OK, so I hope I’ve convinced you that it’d be nice to have
user-defined value types. You might think that it would be best to
model these user-defined value types after the existing primitive
types. I’d like to convince you that this is the wrong path.

The reason that modeling user-defined value types after primitives
is tempting is that primitives have a lot of the behavior we want:

Primitives are immutable. You can’t rewrite the contents of
a string, for example, you have to generate a new one.

Primitives do not have identity. === compares the value, not
the pointer. Two strings, for example, are equal if they contain
the same characters, regardless of where those characters are
stored.

However, primitives also come with a lot of other behavior that is
different from objects, and this behavior doesn’t really scale well
when you allow the set of primitives to be extended by the user:

typeof primitive yields a unique string, like "number" (whereas
all objects, regardless of their prototype, just get "object")

Primitives do not have prototypes, so if you evaluate
primitive.member, what happens is that the primitive is
automatically wrapped in a class like Number or String to yield
an object.

In particular, if you access a primitive value from another realm
(i.e., from an iframe), you copy just the primitive value. If you
try to invoke a method on it, it will get wrapped in the local
wrapper for the current realm, and not the wrapper from the
realm in which it originated.

These kinds of rules work fine for a fixed, well-known set of
primitive types. They do not scale well once we start introducing
arbitrary, user-defined primitive types.

To see why, consider typeof. If we allow user types to define what
string is returned from typeof, then this string is no longer
particularly unique. What do we do if two user-defined types claim the
same typeof string? What about if they try to forge an existing
string, like number?

The lack of prototypes is a bit of a problem as well. For each
primitive type, there is an implicit link to a well-known wrapper
type. But if users define their own primitive types, we’ll have to
link them to a (user-defined) wrapper type as well, so that we can add
methods to those types.

This link gets very thorny in a cross-realm scenario: in that case, if
we want to act like primitives, we need to find a corresponding
wrapper function between the two realms. But there is no guarantee
that the two realms will define the same set of types and no
particularly good way to link those types up even if both realms did
so. So what do we do?

I think the answer is simply that we should not try to model value
types on primitives. After all, the set of classes is already
extensible and has already addressed these problems:

Objects use their prototype to link to their constructor function.

Objects always yield "object" for typeof checks.

Cross-realm objects carry a link (via their prototype) back to their
original realm, side-stepping the need to synchronize class
definitions between realms.

Generalizing typed objects to support user-defined value types

Therefore, I think we should focus on value objects. A value object
is an object whose contents are immutable and which has no
identity. Value objects are based on typed objects – to create one,
users first define a custom value struct type or value array type
and then instantiate it.

The plan can be summarized as follows:

All “primitive types” today are also “value types” (e.g., ints,
uints, objects, etc). (To be clear, when I say “value type”, I mean
“a type whose instances have no individual identity”, so this includes
primitives but also value objects.)

A user-defined struct or array can be made into a value type via a
“valueType()” transformer like var Point = new StructType({x:
uint8, y: uint8}).valueType();

For this to be legal, all of its fields must be of value
type as well. (see appendix A)

All value types are also opaque types.

Instances of a value type (called “value objects”) are equivalent to normal
typed objects except for three differences:

You cannot assign to their properties (naturally).

They are compared for equality by comparing the values of each field for
equality recursively.

If you have a non-value-type with a property p of value type,
and you reference p, the data is copied out into a new value
object. This is basically just an extension of the existing rule
for ints etc.

Explanation through examples

Let me give some examples to show you how it all works. So if I write
something like this, this is a value type:

1

varPoint=newStructType({x:uint8,y:uint8}).valueType();

Now instances of Point are immutable:

123

varpoint=Point();point.x=1;// No effect, see appendix B.assertEq(point.x,0);

I can also create an aggregate value type structure:

12345

varLine=newStructType({from:Point,to:Point}).valueType();varline=Line();line.from.x=1;// No effectassertEq(x,0);

I can also put Point instances into something that is NOT a value
object, in this case an array:

Now this raises a question of mutability. Since the points are stored
inline in the array (i.e., this is NOT an array of pointers-to-points
but just point structs), what happens if I reassign one of its
elements:

12

varp=points[0];points[0]=Point({x:5,y:6});

In particular, did the values of p change? If so, that’s weird,
because p is a Point and hence supposed to be immutable.

This is addressed by rule 2c which says that a read of a value type
creates a copy (if the owner is not a value type). Hence p is not
a pointer into points but rather its own object. Thus mutating
points[0] has no effect on p.

Now there was one last point, which has to do with equality. Points
are value types, so we want it to be true that if I create two points
with identical fields, they should be equal:

For ordinary typed objects, this would not be the case: they would
have distinct buffers. But for value objects, it should hold, and it
does, thanks to rule 2.2 which redefines === for value types.

Rule 2.2 also has another important implication. Without rule 2.2,
the “copy out” semantics (rule 2.3) would be very visible. In other
words, while it normally holds that array[0] === array[0], this
would not hold for an array like points, because accessing an
element of points copies it out.

Hence, without rule 2.2, points[0] !== points[0]. But that’s no good
– we want it to be invisible when copies occur, at least if there
are no mutations going on. But because value objects compare for
equality by comparing their fields, there is no problem. points[0] === points[0]
even though each time we evaluate points[0] we get (at least if we don’t
optimize) a fresh object with a fresh buffer.

One little quirk of these rules, though it’s not inconsistent in some
sense, is that if you try to mutate a field of a value type embedded
within an array, it doesn’t work, even though you could overwrite the
value type as a whole. In other words:

12345

print(points[0].x);// 0 to startpoints[0].x=1;print(points[0].x);// still 0points[0]={x:1,y:2};print(points[0].x);// now 1

The reason for this is that points[0].x first evaluates points[0],
which yields a fresh Pointtemp, and then does temp.x. But
assigning to a field of a value object like temp has no effect, and
hence the assignment is lost.

Frozen arrays

One can easily define a frozen array type of a fixed length:

1

varA=T.arrayType(N).valueType();

One can then instantiate this type using an example instance:

1

vara=newA([...]);

Or perhaps with some sort of build method that is yet to be
specified (though the current PJS strawman incorporates this sort of
thing):

1

vara=A.build(i=>/* create value for index `i` */);

In general, though, we don’t encourage the creation of array types.
Instead, we prefer that people create arrays directly when possible:

1

varmutableArray=T.array(N);

So perhaps we want a similar accessor for creating a frozen array:

1

varmutableArray=T.valueArray(N,[...]);

It’s not clear how build fits into this scenario. Perhaps the
initializer can also be a function. I don’t know, there’s some
bikeshedding to be done here.

A side note: Integration with Map and Set

ES6 is adding some very useful types called Map and Set. These
are more powerful data structures for storing objects. One problem
with them, however, is that they are always keyed on object
identity. This means that if you wanted to have, say, a map keyed by
Color, and you defined a Color class, your lookup is not going to
have the semantics you expect, because two distinct Color instances
that both represent “red” will nonetheless be considered unequal.

Using value types addresses this problem in a simple way without
requiring user-defined comparators and the like. Since the identity
semantics of value objects are based on their fields, if you used a
value type for Color you will get the lookup you expect. Horray!

(Sorry if this is unclear; this post is long enough as it is and I
don’t want to draw out the examples, but I thought this was an
interesting and not entirely obvious interaction.)

Appendices

Appendix A. Embedding non-value-types within value types.

We could permit non-value types to be embedded within value types.
This would imply that value-ness, like opacity, is not necessarily
tied to the type but rather to the instance. I have avoided this
design for two reasons:

I think the semantics of embedding a non-value-type into a
value-type are non-obvious. It has to mean that the embedded
non-value-type becomes immutable or else valueness has little
meaning, but this is potentially confusing and I’d rather just
avoid the question altogether.

It interferes with optimization to have mutability be per-instance
rather than something that is uniquely determined by the type. Not
that it can’t be overcome, but why bother if it’s not a feature we
particularly want.

Appendix B. The semantics of assignments to properties of value types.

I’ve been assuming we want assignments to frozen fields to be dropped
for consistency with frozen fields. Of course I’d prefer they throw an
exception. That doesn’t really affect much else in the rules here. (I
also don’t remember the semantics of assignments to frozen fields in
strict mode – perhaps it should just behave exactly like strict mode
does.)

]]>2014-04-01T15:20:00-04:00http://smallcultfollowing.com/babysteps/blog/2014/04/01/typed-objects-status-reportI recently wrote up a
paper describing the current version of the Typed Objects API. Anyone
who is interested in the current state of the art in that
specification should take a look. It’s not too long and intended to be
an easy read. This is just a draft copy, and feedback is naturally
very welcome – in particular, I expect that before we submit it, the
implementation section will change, since it will be much further
along.

Dmitry and I have also been hard at work on the
actual specification itself and naturally I’ve been working on
the implementation too. The most significant deviation between the
current implementation and the intended specification is described by
Bug 973238 – basically the way we handle arrays is not
right. I’m about 16 patches into the process of fixing that: it
affects a lot of code and I’m trying to do it carefully. Overall,
though, the new model is making the code much cleaner, so I’m excited
about that.

I’ve also been working on an upcoming blog post describing an
extension to typed objects that supports value types – that is,
immutable objects representing small, identity-less values like
colors, points, and so forth. That should be coming soon. It’ll build
on the API described in the draft paper, so you might want to
read that first. ;)

]]>2014-02-28T10:48:00-05:00http://smallcultfollowing.com/babysteps/blog/2014/02/28/rust-rfc-opt-in-builtin-traitsIn today’s Rust, there are a number of builtin traits (sometimes
called “kinds”): Send, Freeze, Share, and Pod (in the future,
perhaps Sized). These are expressed as traits, but they are quite
unlike other traits in certain ways. One way is that they do not have
any methods; instead, implementing a trait like Freeze indicates
that the type has certain properties (defined below). The biggest
difference, though, is that these traits are not implemented manually
by users. Instead, the compiler decides automatically whether or not a
type implements them based on the contents of the type.

In this proposal, I argue to change this system and instead have users
manually implement the builtin traits for new types that they define.
Naturally there would be #[deriving] options as well for
convenience. The compiler’s rules (e.g., that a sendable value cannot
reach a non-sendable value) would still be enforced, but at the point
where a builtin trait is explicitly implemented, rather than being
automatically deduced.

There are a couple of reasons to make this change:

Consistency. All other traits are opt-in, including very common
traits like Eq and Clone. It is somewhat surprising that the
builtin traits act differently.

API Stability. The builtin traits that are implemented by a
type are really part of its public API, but unlike other similar
things they are not declared. This means that seemingly innocent
changes to the definition of a type can easily break downstream
users. For example, imagine a type that changes from POD to non-POD
– suddenly, all references to instances of that type go from
copies to moves. Similarly, a type that goes from sendable to
non-sendable can no longer be used as a message. By opting in to
being POD (or sendable, etc), library authors make explicit what
properties they expect to maintain, and which they do not.

Pedagogy. Many users find the distinction between pod types
(which copy) and linear types (which move) to be surprising. Making
pod-ness opt-in would help to ease this confusion.

Safety and correctness. In the presence of unsafe code,
compiler inference is unsound, and it is unfortunate that users
must remember to “opt out” from inapplicable kinds. There are also
concerns about future compatibility. Even in safe code, it can also
be useful to impose additional usage constriants beyond those
strictly required for type soundness.

I will first cover the existing builtin traits and define what they
are used for. I will then explain each of the above reasons in more
detail. Finally, I’ll give some syntax examples.

The builtin traits

We currently define the following builtin traits:

Send – a type that deeply owns all its contents.
(Examples: int, ~int, not &int)

These impls would follow the usual coherence requirements. For
example, a struct can only be declared as Share within the crate
where it is defined.

For convenience, I also propose a deriving shorthand
#[deriving(Data)] that would implement a “package” of common traits
for types that contain simple data: Eq, Ord, Clone, Show,
Send, Share, Freeze, and Pod.

Pod and linearity

One of the most important aspects of this proposal is that the Pod
trait would be something that one “opts in” to. This means that
structs and enums would move by default unless their type is
explicitly declared to be Pod. So, for example, the following
code would be in error:

If you do nothing, your type is linear, meaning that it moves
from place to place and can never be copied in any way. (We need a
better name for that.)

If you implement Clone, your type is cloneable, meaning that it
moves from place to place, but it can be explicitly cloned. This is
suitable for cases where copying is expensive.

If you implement Pod, your type is plain old data, meaning that
it is just copied by default without the need for an explicit
clone. This is suitable for small bits of data like ints or
points.

What is nice about this change is that when a type is defined, the
user makes an explicit choice between these three options.

Consistency

This change would bring the builtin traits more in line with other
common traits, such as Eq and Clone. On a historical note, this
proposal continues a trend, in that both of those operations used to
be natively implemented by the compiler as well.

API Stability

The set of builtin traits implemented by a type must be considered
part of its public inferface. At present, though, it’s quite invisible
and not under user control. If a type is changed from Pod to
non-pod, or Send to non-send, no error message will result until
client code attempts to use an instance of that type. In general we
have tried to avoid this sort of situation, and instead have each
declaration contain enough information to check it indepenently of its
uses. Issue #12202 describes this same concern, specifically with
respect to stability attributes.

Making opt-in explicit effectively solves this problem. It is clearly
written out which traits a type is expected to fulfill, and if the
type is changed in such a way as to violate one of these traits, an
error will be reported at the impl site (or #[deriving]
declaration).

Pedagogy

When users first start with Rust, ownership and ownership transfer is
one of the first things that they must learn. This is made more
confusing by the fact that types are automatically divided into pod
and non-pod without any sort of declaration. It is not necessarily
obvious why a T and ~T value, which are semantically equivalent,
behave so differently by default. Makes the pod category something you
opt into means that types will all be linear by default, which can
make teaching and leaning easier.

Safety and correctness: unsafe code

For safe code, the compiler’s rules for deciding whether or not a type
is sendable (and so forth) are perfectly sound. However, when unsafe
code is involved, the compiler may draw the wrong conclusion. For such
cases, types must opt out of the builtin traits.

In general, the opt out approach seems to be hard to reason about:
many people (including myself) find it easier to think about what
properties a type has than what properties it does not have,
though clearly the two are logically equivalent in this binary world
we programmer’s inhabit.

More concretely, opt out is dangerous because it means that types with
unsafe methods are generally wrong by default. As an example,
consider the definition of the Cell type:

struct Cell<T> {
priv value: T
}

This is a perfectly ordinary struct, and hence the compiler would
conclude that cells are freezable (if T is freezable) and so forth.
However, the methods attached to Cell use unsafe magic to mutate
value, even when the Cell is aliased:

Note the two markers. The first, marker1, is a hint to the variance
engine indicating that the type Cell must be invariant with respect
to its type argument. The second, marker2, indicates that Cell is
non-freeze. This then informs the compiler that the referent of a
&Cell<T> can’t be considered immutable. The problem here is that, if
you don’t know to opt-out, you’ll wind up with a type definition that
is unsafe.

This argument is rather weakened by the continued necessity of a
marker::InvariantType marker. This could be read as an argument
towards explicit variance. However, I think that in this particular
case, the better solution is to introduce the Mut<T> type described
in #12577 – the Mut<T> type would give us the invariance.

Using Mut<T> brings us back to a world where any type that uses
Mut<T> to obtain interior mutability is correct by default, at least
with respect to the builtin kinds. Types like Atomic<T> and
Volatile<T>, which guarantee data race freedom, would therefore have
to opt in to the Share kind, and types like Cell<T> would simply
do nothing.

Safety and correctness: future compatibility

Another concern about having the compiler automatically infer
membership into builtin bounds is that we may find cause to add new
bounds in the future. In that case, existing Rust code which uses
unsafe methods might be inferred incorrectly, because it would not
know to opt out of those future bounds. Therefore, any future bounds
will have to be opt out anyway, so perhaps it is best to be
consistent from the start.

Safety and correctness: semantic constraints

Even if type safety is maintained, some types ought not to be copied
for semantic reasons. An example from the compiler is the
Datum<Rvalue> type, which is used in code generation to represent
the computed result of an rvalue expression. At present, the type
Rvalue implements a (empty) destructor – the sole purpose of this
destructor is to ensure that datums are not consumed more than once,
because this would likely correspond to a code gen bug, as it would
mean that the result of the expression evaluation is consumed more
than once. Another example might be a newtype’d integer used for
indexing into a thread-local array: such a value ought not to be
sendable. And so forth. Using marker types for these kinds of
situations, or empty destructors, is very awkward. Under this
proposal, users needs merely refrain from implementing the relevant
traits.

The Sized bound

In DST, we plan to add a Sized bound. I do not feel like users
should manually implemented Sized. It seems tedious and rather
ludicrous.

Counterarguments

The downsides of this proposal are:

There is some annotation burden. I had intended to gather statistics
to try and measure this but have not had the time.

If a library forgets to implement all the relevant traits for a
type, there is little recourse for users of that library beyond pull
requests to the original repository. This is already true with
traits like Eq and Ord. However, as SiegeLord noted on IRC, that
you can often work around the absence of Eq with a newtype
wrapper, but this is not true if a type fails to implement Send or
Pod. This danger (forgetting to implement traits) is essentially
the counterbalance to the “forward compatbility” case made above:
where implementing traits by default means types may implement too
much, forcing explicit opt in means types may implement too little.
One way to mitigate this problem would be to have a lint for when an
impl of some kind (etc) would be legal, but isn’t implemented, at
least for publicly exported types in library crates.

]]>2014-02-25T12:29:00-05:00http://smallcultfollowing.com/babysteps/blog/2014/02/25/rust-rfc-stronger-guarantees-for-mutable-borrowsToday, if you do a mutable borrow of a local variable, you lose the
ability to write to that variable except through the new reference
you just created:

let mut x = 3;
let p = &mut x;
x += 1; // Error
*p += 1; // OK

However, you retain the ability to read the original variable:

let mut x = 3;
let p = &mut x;
print(x); // OK
print(*p); // OK

I would like to change the borrow checker rules so that both writes
and reads through the original path x are illegal while x is
mutably borrowed. This change is not motivated by soundness, as I
believe the current rules are sound. Rather, the motivation is that
this change gives strong guarantees to the holder of an &mut
pointer: at present, they can assume that an &mut referent will not
be changed by anyone else. With this change, they can also assume
that an &mut referent will not be read by anyone else. This enable
more flexible borrowing rules and a more flexible kind of data
parallelism API than what is possible today. It may also help to
create more flexible rules around moves of borrowed data. As a side
benefit, I personally think it also makes the borrow checker rules
more consistent (mutable borrows mean original value is not usable
during the mutable borrow, end of story). Let me lead with the
motivation.

Brief overview of my previous data-parallelism proposal

In a previous post I outlined a plan for
data parallelism in Rust based on closure bounds. The rough idea
is to leverage the checks that the borrow checker already does for
segregating state into mutable-and-non-aliasable and
immutable-but-aliasable. This is not only the recipe for creating
memory safe programs, but it is also the recipe for data-race freedom:
we can permit data to be shared between tasks, so long as it is
immutable.

The API that I outlined in that previous post was based on a fork_join
function that took an array of closures. You would use it like this:

The idea of fork_join was that it would (potentially) fork into N
threads, one for each closure, and execute them in parallel. These
closures may access and even mutate state from the containing scope –
the normal borrow checker rules will ensure that, if one closure
mutates a variable, the other closures cannot read or write it. In
this example, that means that the first closure can mutate left so
long as the second closure doesn’t touch it (and vice versa for
right). Note that both closures share access to x, and this is
fine because x is immutable.

This kind of API isn’t safe for all data though. There are things that
cannot be shared in this way. One example is Cell, which is Rust’s
way of cheating the mutability rules and making a value that is
always mutable. If we permitted two threads to touch the same
Cell, they could both try to read and write it and, since Cell
does not employ locks, this would not be race free.

To avoid these sorts of cases, the closures that you pass to to
fork_join would be bounded by the builtin trait Share. As I
wrote in issue 11781, the trait Share indicates data that
is threadsafe when accessed through an &T reference (i.e., when
aliased).

Most data is sharable (let T stand for some other sharable type):

POD (plain old data) types are forkable, so things like int etc.

&T and &mut T, because both are immutable when aliased.

~T is sharable, because is is not aliasable.

Structs and enums that are composed of sharable data are sharable.

ARC, because the reference count is maintained atomically.

The various thread-safe atomic integer intrinsics and so on.

Things which are not sharable include:

Many types that are unsafely implemented:

Cell and RefCell, which have non-atomic interior mutability

Rc, which uses non-atomic reference counting

Managed data (Gc<T>) because we do not wish to
maintain or support a cross-thread garbage collector

There is a wrinkle though. With the current borrow checker rules,
forkable data is only safe to access from a parallel thread if the
main thread is suspended. Put another way, forkable closures can
only run concurrently with other forkable closures, but not with the
parent, which might not be a forkable thing.

This is reflected in the API, which consisted of a function
fork_join function that both spawned the threads and joined them.
The natural semantics of a function call would thus cause the parent
to block while the threads executed. For many use cases, this is just
fine, but there are other cases where it’s nice to be able to fork off
threads continuously, allowing the parent to keep running in the
meantime.

Note: This is a refinement of the previous proposal, which was
more complex. The version presented here is simpler but equally
expressive. It will work best when combined with my (ill documented,
that’s coming) plans for unboxed closures, which are required
to support convenient array map operations and so forth.

A more flexible proposal

If we made the change that I described above – that is, we prohibit
reads of data that is mutably borrowed – then we could adjust the
fork_join API to be more flexible. In particular, we could support
an API like the following:

The idea here is that we replaced the fork_join() call with a call
to fork_join_section(). This function takes a closure argument and
passes it a an argument sched – a scheduler. The scheduler offers a
method fork that can be invoked to fork off a potentially parallel
task. This task may begin execution immediately and will be joined
once the fork_join_section ends.

In some sense this is just a more verbose replacement for the previous
call, and I imagine that the fork_join() function I showed
originally will remain as a convenience function. But in another sense
this new version is much more flexible – it can be used to fork off
any number of tasks, for example, and it permits the main thread to
continue executing while the fork runs.

An aside: it should be noted that this API also opens the door
(wider) to a kind of anti-pattern, in which the main thread quickly
enqueues a ton of small tasks before it begins to operate on
them. This is the opposite of what (e.g.) Cilk would do. In Cilk, the
processor would immediately begin executing the forked task, leaving
the rest of the “forking” in a stealable thunk. If you’re lucky, some
other proc will come along and do the forking for you. This can reduce
overall overhead. But anyway, this is fairly orthogonal.

Beyond parallelism

The stronger guarantee concerning &mut will be useful in other
scenarios. One example that comes to mind are moves: for example,
today we do not permit moves out of borrowed data. In principle,
though, we should be able to permit moves out of &mut data, so long
as the value is replaced before anyone can read it.

Without the rule I am proposing here, though, it’s really hard to
prevent reads at all without tracking what pointers point at (which we
do not do nor want to do, generally). Consider even a simple program
like the following:

I don’t want to dive into the details of moves here, because
permitting rules from borrowed pointers is a complex topic of its own
(we must consider, for example, failure and what happens when
destructors run). But without the proposal here, I think we can’t even
get started.

Speaking more generally and mildly more theoretically, this rule helps
to align Rust logic with separation logic. Effectively, &mut
references are known to be separated from the rest of the heap. This is
similar to what research languages like Mezzo do. (By the way,
if you are not familiar with Mezzo, check it out. Awesome stuff.)

Impact on existing code

It’s hard to say what quantity of existing code relies on the current
rules. My gut tells me “not much” but without implementing the change
I can’t say for certain.

How to implement

Implementing this rule requires a certain amount of refactoring in the
borrow checker (refactoring that is needed for other reasons as well,
however). In the interest of actually completing this blog post, I’m
not going to go into more details (the post has been sitting for some
time waiting for me to have time to write this section). If you think
you might like to implement this change, though, let me know. =)

]]>2014-02-04T22:39:00-05:00http://smallcultfollowing.com/babysteps/blog/2014/02/04/closures-and-the-borrow-checkerI have been working on making the borrow checker treat closures in a
sound way. I hope to land this patch very soon. I want to describe the
impact of these changes and summarize what we might do in the future.

The high-level idea

The basic idea is that the borrow checker will treat a closure as if
it were constructing a record with one borrowed pointer for each
variable mentioned in the closure.

The borrow checker will treat the closure that appears in foo() as
it it borrows map mutably for its entire lifetime. It’s kind of
roughly as if the code were written as shown below, where the closure
expression |k, v| map.insert(k, v) has been replaced with an
explicit pair (&mut env, callback) that combines the environment
env with a code pointer (this is of course what happens at runtime):

This has the nice property that the borrow checker’s treatment of
closures is kind of a simple variation on its treatment of other kinds
of structures.

Implications

There are all sorts of issues in the issue tracker showing how the
current treatment of closures is unsound. Clearly the most important
impact of these changes is fixing all of those issues. However, the
changes also cause some reasonable code that used to work to no longer
work. I encountered two major kinds of errors, which I will describe
here along with the workarounds.

Errors due to closures borrowing more than is necessary

The first arises because, as I currently wrote the analysis, closures
always borrow an entire local variable, but sometimes they only use a
subpath. Let me give an example to show what I mean:

This version works because neither closure is borrowing cx; rather,
they are borrowing different local variables (cx_ints and
cx_chars). The borrows of cx, meanwhile, are taking place in the
main function body, and the borrow checker can see that they refer to
different fields and hence are legal.

It’s quite possible that we could improve the safety analysis to
automatically consider when a closure only borrows specific fields of
a local variable. In other words, we could perhaps do a better rewrite
and thus avoid the need to introduce the extra local variables.

Errors due to closures sharing mutable and immutable data

Currently in Rust we only contain mutable and immutable borrows. Note
that these two things are mutually exclusive, because the same memory
cannot be both constant and changing at the same time. We used to have
const borrows, which meant “possibly mutable but not by me”, but we
removed them in an effort to keep the language simple. This decision
impacts some closure patterns, most notably the try_finally pattern.

Here is a very simple and artifical example. Suppose you wanted to
read items and, at the end, send a message indicate how many items you
had read – and this message must be sent, even if you fail
unexpectedly. You might write the code something like this:

This is relying on the try-finally module, which adds a finally
method to closures. The main closure will be called and then the
finally closure will be called, regardless of the whether the main
closure failed.

Under the new rules, this code will not type-check. This is because
the main closure is borrowing total_read mutably (so that it can be
incremented) and the finally closure is borrowing total_read
immutably (so that it can be read). In general, I think this is a
pretty reasonable rule: if failure occurs in the try clause, chances
are that anything it is mutating is in a pretty messed up state, so
you probably don’t want to be reading it (this is the same reasoning
behind Rust’s general “fail fast” philosophy). Nonetheless, in this
case, since all we’re talking about is an integer, it’s clearly ok.

There are two ways we can rewrite this example so that it type checks.
Perhaps the simplest is to employ a Cell type, which is Rust’s
general purpose tool for permitting mutability in aliasable data. The
idea of Cell is that, given an immutabe pointer to a Cell, you can
still mutate the cell’s contents using the get() and set()
methods. Cell is not well-suited to all types, but it works great
for integers and other scalars. Here is the example rewritten to use
Cell:

This code is a bit more awkward because to access the value of the
total_read value I must write total_read.get(), and to update the
value I write total_read.set(). However, it type checks, because
both closures can share access to the same Cell.

I also added a more “full-featured” variation on the try-finally
API. The idea is that this signature takes two closures, as before,
but it also takes two additional bits of data: first, some shared
mutable state that both closures will have access to, and second, some
state that will be moved into the try closure for it to use as it
likes (this second parameter would not be needed if we added support
for once closures).

What happens here is that we borrow total_read once, mutably, and
pass it into try_finally(). try_finally() then takes this mutable
pointer and passes it to both the try and finally closure in turn. (In
this case, we don’t need to move any state into the try closure, so we
just pass the unit value () as the second argument.)

Conclusions and future work

For now, I opted to keep the design simple, and this leads to some
spurious errors. I expect we will eventually improve the borrow
checker so that it considers full path borrows; this seems clearly
better (but also clearly an extension).

I am not so certain about the second class of error. In my original
design, I included const borrows so that simple scalar values like
total_read could be updated by one closure and read by another. I
removed this in order to make the semantics better match the kinds of
borrows we find elsewhere in the language. It turned up not to affect
much code – only two or three functions – in the current
codebase. Given the reasonable workarounds available, and limited
importance of this situation, I’m inclined to leave the rules as I
have described them. But if it proves that this kind of error arises
frequently “in the wild”, we could consider adding const borrows back.

The branch with these changes is
issue-6801-borrowck-closures. It is currently passing tests and
I hope to clean it up a bit and open a pull request soon.

]]>2014-01-09T01:04:00-05:00http://smallcultfollowing.com/babysteps/blog/2014/01/09/rvalue-lifetimes-in-rustI’ve been working on Issue #3511, which is an effort to
rationalize the lifetimes of temporary values in Rust. This issue has
been a thorn in the side of Rust users for a while, because the
current lifetimes are rather haphazard and frequently too short. Some
time ago, I did some thinking on this issue and then let it lie
while other things took priority.

Part of the reason that this issue has lasted so long is that the
current trans cleanup scheme is very inflexible. I have a
branch now that rewrites the cleanup system so that it can
handle any rules we would like. The problem I am encountering now, of
course, is that it’s unclear what the rules should be. I want to lay
out the options I see.

The problem

There are numerous situations in which Rust users borrow temporary
values; the tricky part is deciding what the lifetime of these
temporary values ought to be. Put another way, when should we run the
destructor for these temporaries? To see what I mean, let me show a
few examples: I’ll focus on three use cases that I think are fairly
representative, and which I see appearing in the code base a lot.
It’s possible though that I’m missing some good examples.

Example 1: borrowing an rvalue. Here is the first example:

let map = &mut HashMap::new(); // (1)

The effect of this expression is to create a temporary stack variable
and create a pointer to it. It is roughly equivalent to something
like this:

let _temp = HashMap::new();
let map = &mut _temp;

The question that I want to consider in this post is what the lifetime
of this temporary ought to be. In the explicit expansion, the lifetime
of the temporary is clear: it will live as long as the explicit
variable _temp. But the correct semantics for the first example
are less clear (as we’ll see).

Applying the & operator to an rvalue might seem a bit silly at
first. Why not write it using an explicit temporary, after all? There
are a couple of reasons though that it’s worth supporting, but the
most crucial one is in macros. In macros, it’s useful to be able to
apply the borrow operator to any expression in order to avoid moving
or copying values when you don’t want to. For example, the expansion
of assert_eq!($a, $b) is something like:

{
let _a = & $a;
let _b = & $b;
if _a != _b {
fail!(...)
}
}

If we didn’t use the & operator, then the first two lines might
cause inapprorpriate moves. For example, if I wrote assert_eq!(x.a,
y.b), and the type of x.a was affine, then assert_eq! would move
from both x.a and y.b. Not so good.

Example 2: ref bindings and rvalues. The second example is in some
way just a different syntax for the same thing (or, I should say,
somethign which probably ought to be the same, though some of the
rules I describe do not treat it the same way):

let ref mut map = HashMap::new();

Example 3: Autoref in method calls. Method calls in Rust typically
take borrowed pointers to their receivers, but one rarely writes this
explicitly. Instead, the receiver is implicitly borrowed via a
mechanism called “autoref” (this is actually the same as in C++,
except that in C++ all method calls are autoref’d, whereas in Rust you
can also have method calls that take the receiver by value and not by
reference).

One example that is becoming rather common in the rustc code base is
the RefCell type. RefCell is a standard library type that allows
some of the Rust compiler’s static checks to be converted into dynamic
checks; it replaces @mut, which bundled together dynamic checks with
managed data, and just isolates out the dynamic check portion so that
it can be reused with other smart pointer types.

The way that RefCell works is that you invoke the borrow or
borrow_mut methods:

let map: RefCell<HashMap<K,V>> = ...;
let mut r = map.borrow_mut();

These methods check some bits to ensure that the value is not borrowed
in an incompatible way already (basically: a mutable borrow must not
overlap with any other borrows). These methods then toggle some bits
and return a special Ref type (r in the example above). This Ref
type has a destructor which resets the bit, effectively ending the
borrow. In the meantime, the Ref type can be used to get access to
the data itself:

let data = r.get();

Note that there is an implicit borrow of the variable r occurring here.
In a sense, that method call could be expanded to:

let data = (&mut r).get();

The key point here is that the lifetime of the variable r exactly
corresponds to the lifetime of the dynamic borrow of map. Therefore,
having a good understanding of when the destructor for r will run is
crucial to know how long your map is borrowed for. For example, if you
write code like the following, you will get a fatal error:

The problem is that the second borrow (r2) occurs before the first
borrow has completed; more operationally, the second borrow occurs
before the destructor for r executes.

The lifetime of a borrow is relatively clear so long as explicit
temporaries are used, as I showed so far. But it’s kind of verbose.
For example, to insert an item into a map, I have to write something
like:

let mut r = map.borrow_mut();
r.get().insert(k, v);

It’d be nicer if we could remove the temporary:

map.borrow_mut().get().insert(k, v);
// temporary: ^~~~^

But this gets right back to the issue were talking about, because the
call to get() in fact takes the address of the receiver, and in this
case the receiver is an rvalue map.borrow_mut().

Some solutions

I want to explore various rules we could use to decide when the
destructors run, and see what the effect would be on each example.

Solution 0: Innermost enclosing statement.

My first attempt (what is currently written on the branch) was to make
all temporaries tied to the innermost enclosing statement (roughly,
see Appendix B for full details). I think this does the right thing
for example 3, in that it releases the borrow at the of the statement:

The reason for this is that, if the hashmap only lives as long as the
statement, the value in map gets destructored as soon as it is assigned,
and thus cannot safely be used by the following statements. That is,
the following code would access freed memory:

let map = &mut HashMap::new();
map.insert(...);

I think this solution is not workable because one cannot write the
assert_eq macro above.

Solution 1: Innermost enclosing block.

To address the problem with solution 0, we might try to use the innermost
enclosing block. This makes examples 1 and 2 work find, but example 3
doesn’t work so well:

map.borrow_mut().get().insert(k, v); // (3)

The problem is that here the borrow isn’t released until the end of the
enclosing block, rather than the enclosing statement. This probably
way too late. For example, code like the following would fail dynamically:

I think this solution is not workable because it is too painful
to work with RefCell.

Solution 2: Variations on the C++ rule (roughly).

Interestingly, C++ has a similar problem concerning temporaries, and
they have a rather custom rule that attempts to address exactly the
issue I encountered with solution 0. The C++ rule, as I understand it,
is that temporaries live as long as the innermost enclosing statement,
unless the temporary is assigned to an (reference) variable, in which
case it lives as long as that variable.

So, for example, if I had a call to a function that took a map rvalue
reference, as follows:

V& find(const map<K,V>& m) { ... }
use(find(map(...)));

then the map will be freed after use() returns. This is true even
though the temporary was created as an argument to find(). Basically
the temporary will live until the next semicolon, roughly speaking.

Now there is one exception to this rule. If I asssign the temporary to
a variable, then it lives as long as the variable:

const map<K,V>& m = map(...);

In this case, the destructor for map will run once m goes out of
scope.

It is a bit challenging to make a direct equivalent to this rule in
Rust. For one thing, we have explicit borrows (the & operator) and
also ref bindings. For another, assignments can be more complicated,
e.g.:

let Foo { a: ref a, b: b } = create_foo();

In this case, one of the fields is bound by reference, but the other
is moved.

Variation A. Still, we could make a rule that says something like this:
let bindings where the initializer is an rvalue first store the initializer
into a temporary with the lifetime of the innermost block, and then assign
from that temporary into the pattern. So effectively let pat = rvalue becomes:

Example 1 is unaffected by the rule and hence still an error for the
same reasons as Solution 0: the temporary created by the explicit
borrow goes out of scope at the end of the let statement, rather
than the block. Example 2 would work, though, because the temporary in
that case would have an extended lifetime. In the case of Example 3
(the RefCell), the borrow would terminate at the end of the
statement, as desired.

Variation B. Another option would be to say that the borrow
operator & uses the lifetime of the innermost enclosing block, but
all other temporaries use the innermost enclosing statement. This rule
is easier to explain than variation A, and it has the opposite effect
on examples 1 and 2:

Summary. I think variations A, B, or C would all be potentially
workable.

Solution 3: Inference.

Finally, we can rely on inference. Essentially the compiler would
decide the smallest lifetime that makes the program legal. This makes
all of the examples I’ve given work, but at a cost in predictability
– it’s hard to know when your destructors run. For things like
RefCell, this is of course a potential problem. Overall, while I
think inference is workable, it is almost universally unpopular,
simply because people do not like the idea of an ill-defined lifetime
inference algorithm dictating when their destructor will execute.

Conclusions

All in all I guess I leans towards some variation of Solution 2. I
like Variation B (make the lifetime of the explicit & operator be
the innermost enclosing block; otherwise, innermost enclosing
statement) because it’s easy to express and implement, but I also like
Variation C because it treats examples 1 and 2 the same way. Whatever
we do, having an explicit option seems like a good idea (see Appendix
A).

Appendix A. Explicit annotation

Regardless of what rule we pick, it is possible to permit users to
explicitly annotation temporary lifetimes. One of the motivations for
the current lifetime syntax was to permit users to annotate blocks
(and perhaps statements/expressions) with lifetime names and then
refer to those later. For example, one might create a temporary and
explicitly state that it should be destructed in an outer block:

'a: {
{
let m = &'a mut HashMap::new();
...
}
}

There would be some limits to these explicit temporaries. For example,
you could not create a temporary in an outer block if you are within
an if or loop statement (this is needed to ensure fixed size
stacks and to ensure we know what values to run destructors on
statically).

Appendix B. Tail expressions in block.

In many of the rules above, I’ve referenced the innermost enclosing
block or statement. But what is the innermost enclosing block or
statement in a situation like:

let v = {
&mut HashMap::new()
};

It might be nice to make the tail expression in a block belong,
effectively, to its parent.

In my existing code (which implements Solution 0), the actual rule is
not “”innermost enclosing statement” but rather “innermost enclosing
statement, loop body, or function”. I do not consider the tail
expression of a block to be in a statement. This means that
temporaries in the tail expression effectively have the lifetime of
the statement (or loop body, or function body) in which the block
appears.

Appendix C. Match expressions.

Ref bindings can also appear in match expressions, of course.
Regardless, I think the semantics of match on an rvalue probably ought
to be that the temporary value lives as long as the enclosing
statement, regardless of what bindings it contains; that seems to be
what most people expect.

]]>2014-01-05T11:39:00-05:00http://smallcultfollowing.com/babysteps/blog/2014/01/05/dst-take-5I believe I have come to the point where I am ready to make a final
proposal for DST. Ironically, this proposal is quite similar to where
I started, but somewhat more expansive. It seems to be one of those
unusual cases where supporting more features actually makes things
easier. Thanks to Eridius on IRC for pointing this out to me. I
intend for this post to stand alone, so I’m going to start from the
beginning in the description.

I am reasonably confident that this DST proposal hangs together
because I have taken the time to develop a formal model of Rust and
then to extend that model with DST. The model was done in Redex and
can be found here. Currently it only includes reduction
semantics. I plan a separate blog post describing the model in detail.

Overview

Dynamically sized types

The core idea of this proposal is to introduce two new types [T] and
Trait. [T] represents “some number of instances of T laid out
sequentially in memory”, but the exact number if unknown. Trait
represents “some type T that implements the trait Trait”.

Both of these types share the characteristic that they are
existential variants of existing types. That is, there are
corresponding types which would provide the compiler with full static
information. For example, [T] can be thought of as an instance of
[T, ..n] where the constant n is unknown. We might use the more
traditional – but verbose – notation of exists n. [T, ..n] to
describe the type. Similarly, Trait is an instance of some type T
that implements Trait, and hence could be written exists
T:Trait. T. Note that I do not propose adding existential syntax into
Rust; this is simply a way to explain the idea.

These existential types have an important facet in common: their size
is unknown to the compiler. For example, the compiler cannot compute
the size of an instance of [T] because the length of the array is
unknown. Similarly, the compiler cannot compute the size of an
instance of Trait because it doesn’t know what type that really
is. Hence I refer to these types as dynamically sized – because the
size of their instances is not known at compilation time. More often,
I am sloppy and just call them unsized, because everybody knows that
– to a compiler author, at least – compile time is the only
interesting thing, so if we don’t know the size at compile time, it is
equivalent to not knowing it at all.

Restrictions on dynamically sized types

Because the type (and sometimes alignment) of dynamically sized types
is unknown, the compiler imposes various rules that limit how
instances of such types may be used. In general, the idea is that you
can only manipulate an instance of an unsized type via a pointer. So
for example you can have a local variable of type &[T] (pointer to
array of T) but not [T] (array of T).

Pointers to instances of dynamically sized types are fat pointers –
that means that they are two words in size. The secondary word
describes the “missing” information from the type. So a pointer like
&[T] will consist of two words: the actual pointer to the array, and
the length of the array. Similarly, a pointer like &Trait will
consist of two words: the pointer to the object, and a vtable for Trait.

I’ll cover the full restrictions later, but the most pertinent are:

Variables and arguments cannot have dynamically sized types.

Only the last field in a struct may have a dynamically sized type;
the other fields must not. Enum arguments must not have dynamically
sized types.

unsized keyword

Any type parameter which may be instantiated with an unsized type must
be designed using the unsized keyword. This means that <T> is not
the most general definition for a type parameter; rather it should be
<unsized T>.

I originally preferred for all parameters to be unsized by default.
However, it seems that the annotation burden here is very high, so for
the moment we’ve been rejecting this approach.

The unsized keyword crops up in a few unlikely places. One particular
place that surprised me is in the declaration of traits, where we need
a way to annotate whether the Self type may be unsized. It’s not
entirely clear what this syntax should be. trait Foo<unsized Self>,
perhaps? This would rely on Self being a keyword, I suppose. Another
option is trait Foo : unsized, since that is typically where bounds
on Self appear. (TBD)

Bounds in type definitions

Currently we do not permit bounds in type declarations. The reasoning
here was basically that, since a type declaration never invokes
methods, it doesn’t need bounds, and we could mildly simplify things
by leaving them out.

But the DST scheme needs a way to tag type parameters as potentially
unsized, which is a kind of bound (in my mind). Moreover, we
[also need bounds to handle destructors][drop], so I think this rule
against bounds in structs is just not tenable.

Once we permit bounds in structs, we have to decide where to enforce
them. My proposal is that we check bounds on the type of every
expression. Another option is just to check bounds on struct
literals; this would be somewhat more efficient and is theoretically
equivalent, since it ensures that you will not be able to create an
instance of a struct that does not meet the struct’s declared
boundaries. However, it fails to check illegal transmute calls.

Creating an instance of a dynamically sized type

Instances of dynamically sized types are obtained by coercing an
existing instance of a statically sized type. In essence, the compiler
simply “forgets” a piece of the static information that it used to
know (such as the length of the vector); in the process, this static
bit of information is converted into a dynamic value and added into
the resulting fat pointer.

Intuition

The most obvious cases to be permitted are coercions from built-in
pointer types, such as &[T, ..n] to &[T] or &T to &Trait
(where T:Trait). Less obvious are the rules to support coercions for
smart pointer types, such as Rc<[T, ..n]> being casted to Rc<[T]>.

This is a bit more complex than it appears at first. There are two
kinds of conversions to consider. This is easiest to explain by example.
Let us consider a possible definition for a reference-counting smart
pointer Rc:

struct Rc<unsized T> {
ptr: *RcData<T>,
// In this example, there is no need of more fields, but
// for purposes of illustration we can imagine that there
// are some additional fields here:
dummy: uint
}
struct RcData<unsized T> {
ref_count: uint,
#[max_alignment] // explained later
data: T,
}

From this definition you can see that a reference-counted pointer
consists of a pointer to an RcData struct. The RcData struct
embeds a reference count followed by the data from the pointer itself.

We wish to permit a type like Rc<[T, ..n]> to be cast to a type like
Rc<[T]>. This is shown in the following code snippet.

let rc1: Rc<[T, ..3]> = ...;
let rc2: Rc<[T]> = rc1 as RC<[T]>;

What is interesting here is that the type we are casting to, RC<[T]>,
is not actually a pointer to an unsized type. It is a struct that contains
such a pointer. In other words, we could convert the code fragment above
into something equivalent but somewhat more verbose:

In this example, we have unpacked the pointer (and dummy field) out of
the input rc1 and then cast the pointer itself. This second cast,
from ptr1 to ptr2, is a cast from a thin pointer to a fat pointer.
We then repack the data to create the new pointer. The fields in the
new pointer are the same, but because the ptr field has been
converted from a thin pointer to a fat pointer, the offsets of the
dummy field will be adjusted accordingly.

So basically there are two cases to consider. The first is the literal
conversion from thin pointers to fat pointers. This is relatively
simple and is defined only over the builtin pointer types (currently:
&, *, and ~). The second is the conversion of a struct which
contains thin pointer fields into another instance of that same stuct
type where fields are fat pointers. The next section defines these
rules in more detail.

Conversion rules

Let’s start with the rule for converting thin pointers into fat
pointers. This is based on the relation Fat(T as U) = v. This
relation says that a pointer to T can be converted to a fat pointer
to U by adding the value v. Afterwards, we’ll define the full
rules that define when T as U is permitted.

Conversion from thin to fat pointers

There are three cases to define the Fat() function. The first rule
Fat-Array permits converting a fixed-length array type [T, ..n]
into the type [T] for an of unknown length. The second half of the
fat pointer is just the array length in that case.

The second rule Fat-Object permits a pointer to some type T to be
coerced into an object type for the trait Trait. This rule has three
conditions. The first condition is simply that T must implement
Trait, which is fairly obvious. The second condition is that T
itself must be sized. This is less obvious and perhaps a bit
unfortunate, as it means that even if a type like [int] implements
Trait, we cannot create an object from it. This is for
implementation reasons: the representation of an object is always
(pointer, vtable), no matter the type T that the pointer points
at. If T were dynamically sized, then pointer would have to be a
fat pointer – since we do not known T at compile time, we would
have no way of knowing whether pointer was a thin or fat
pointer. What’s worse, the size of fat pointers would be effectively
unbounded. The final condition in the rule is that the type T has a
suitable alignment; this rule may not be necessary. See Appendix A
for more discussion.

The final rule Fat-Struct permits a pointer to a struct type to be
coerced so long as all the fields will still have the same type except
the last one, and the last field will also be a legal coercion. This
rule would therefore permit Fat(RcData<[int, ..3]> as RcData<[int]>)
= 3, for example, but not Fat(RcData<int> as RcData<float>).

Coercion as a whole

Now that we have defined that Fat() function, we can define the full
coercion relation T as U. This relation states that the type T is
coercable to the type U; I’m just focusing on the DST-related
coercions here, though we do in fact do other coercions. The rough
idea is that the compiler allows coercion not only simple pointers but
also structs that include pointers. This is needed to support smart
pointers, as we’ll see.

The first and simplest rule is the identity rule, which states that we
can “convert” a type T into a type T (this is of course just a
memcpy at runtime – note that if T is affine then coercion consumes the
value being converted):

The next rule states that we can convert a thin pointer into a fat pointer
using the Fat() rule that we described above. For now I’ll just give
the rule for unsafe pointers, but analogous rules can be defined for
borrowed pointers and ~ pointers:

Finally, the third rule states that we can convert a struct R<T...>
into another instance of the same struct R<U...> with different type
parameters, so long as all of its fields are pairwise convertible:

The purpose of this rule is to support smart pointer coercion. Let’s
work out an example to see what I mean. Imagine that I define a smart
pointer type Rc for ref-counted data, building on the RcData type
I introduced earlier:

struct Rc<unsized U> {
data: *RcData<U>
}

Now if I had a ref-counted, fixed-length array of type
Rc<[int, ..3]>, I might want to coerce this into a variable-length
array Rc<[int]>. This is permitted by the Coerce-Struct rule. In
this case, the Rc type is basically a newtyped pointer, so it’s
particularly simple, but we can permit coercions so long as the
individual fields either have the same type or are converted from a
thin pointer into a fat pointer.

These rules as I presented them are strictly concerned with type
checking. The code generation of such casts is fairly
straightforward. The identity relation is a memcpy. The thin-to-fat
pointer conversion consists of copying the thin pointer and adding the
runtime value dictated by the Fat function. The struct conversion is
just a recursive application of these two operations, keeping in mind
that the offset in the destination must be adjusted to account for the
size of the increased size of the fat pointers that are produced.

Working with values of dynamically sized types

Just as with coercions, working with DST values can really be
described in two steps. The first are the builtin operators. These are
only defined over builtin pointer types like &[T]. The second is the
method of converting an instance of a smart pointer type like
RC<[T]> into an instance of a builtin pointer type. We’ll start by
examining the mechanism for converting smart pointers into builtin
pointer types, and then examine the operations themselves.

Side note: custom deref operator

The key ingredients for smart pointer integration is an overloadable
deref operator. I’ll not go into great detail, but the basic idea is
to define various traits that can be implemented by smart pointer
types. The precise details of these types merits a separate post and
is somewhat orthogonal, but let’s just examine the simplest case, the
ImmDeref deref:

trait ImmDeref<unsized T> {
fn deref<'a>(&'a self) -> &'a T;
}

This trait would be implemented by most smart pointer types. The type
parameter T is the type of the smart pointer’s referent. The trait
says that, given a (borrowed) smart pointer instance with lifetime
'a, you can dereference that smart pointer and obtain a borrowed
pointer to the referent with lifetime 'a.

Note: As implied by my wording above, I expect there will be a small
number of deref traits, basically to encapsulate different mutability
behaviors and other characteristics. I’ll go into this in a separate
post.

Indexing into variable length arrays

rustc natively supports indexing into two types: [T] (this is a fat
pointer, where the bounds are known dynamically) and [T, ..n] (a
fixed length array, where the bounds are known statically). In the
first case, the type rules ensure that the [T] value will always be
located as the referent of a fat pointer, and the bounds can be loaded
from the fat pointer and checked dynamically. In the second case, the
bounds are inherent in the type (though they must still be checked,
unless the index is a compile-time constant). In addition, the set of
indexable types can be extended by implementing the Index trait. I
will ignore this for now as it is orthogonal to DST.

To participate in indexing, smart pointer types do not have to do
anything special, they need simply overload deref. For example, given
an expresion r[3] where r has type Rc<[T]>, the compiler will
handle it as follows:

The type Rc<[T]> is not indexable, so the compiler will attempt to
dereference it as part of the autoderef that is associated with the
indexing operator.

Deref succeeds because Rc implements the ImmDeref trait described
above. The type of *r is thus &[T].

The type &[T] is not indexable either, but it too can be dereferenced.

We now have an lvalue **r of type [T]. This
type is indexable, so the search completes.

Invoking methods on objects

This works in a similar fashion to indexing. The normal autoderef
process will lead to a type like Rc<Trait> being converted to
&Trait, and from there method dispatch proceeds as normal.

Drop glue and so on

There are no particular challenges here that I can see. When asked to
drop a value of type [T] or Trait, the information in the fat
pointer should be enough for the compiler to proceed.

By way of example, here is how I imagine Drop would be implemented
for Rc:

Appendices

A. Alignment for fields of unsized type

There is one interesting subtlety concerning access to fields of type
Trait – in such cases, the alignment of the field’s type is
unknown, which means the compiler cannot statically compute the
field’s offset. There are two options:

Extract the alignment information from the vtable, which is
present, and compute the offset dynamically. More complicated to
codegen but retains maximal flexibility.

Require that the alignment for fields of (potentially) unsized type be
statically specified, or else devise custom alignment rules for such
fields corresponding to the same alignment used by malloc().

The latter is less flexible in that it implies types with greater
alignment requirements cannot be made into objects, and it also
implies that structs with low-alignment payloads, like RC<u8>, may
be bigger than they need to be, strictly speaking.

The other odd thing about solution #2 is that it implies that a
generic structure follows separate rules from a specific version of that
structure. That is, given declarations like the following:

It is currently true that Foo1<int>, Foo2<int>, and Foo3 are
alike in every particular. But under solution 2 the alignment of
Foo2<int> may be greater than Foo1<int> or Foo3 (since the field
x was declared with unsized type U and hence has maximal
alignment).

B. Traits and objects

Currently, we only permit method calls with an object receiver if the
method meets two conditions:

The method does not employ the Self type except as the type of the
receiver.

The method does not have any type parameters.

The reason for the first restriction is that, in an object, we do not
know what value the type parameter Self is instantiated with, and
therefore we cannot type check such a call. The reason for the second
restriction is that we can only put a single function pointer into the
vtable and, under a monomorphization scheme, we potentially need an
infinite number of such methods in the vtable. (We could lift this
restriction if we supported an “erased” type parameter system, but
that’s orthogonal.)

Under a DST like system, we can easily say that, for any trait Trait
where all methods meet the above restrictions, then the dynamically
sized type Trait implements the trait Trait. This seems rather
logical, since the type Trait represents some unknown type T that
implements Trait. We must however respect the above two
restrictions, since we will still be dispatching calls dynamically.

(It might even be simpler, though less flexible, to just say that we
can only create objects for traits that meet the above two
restrictions.)

]]>2013-12-12T13:07:00-05:00http://smallcultfollowing.com/babysteps/blog/2013/12/12/structural-arrays-in-typed-objectsDave Herman and I were tossing around ideas the other day for a
revision of the typed object specification in which we remove nominal
array types. The goal is to address some of the awkwardness that we
have encountered in designing the PJS API due to nominal array types.
I thought I’d try writing it out. This is to some extent a thought
experiment.

Description by example

I’ve had a hard time trying to identify the best way to present the
idea, because it is at once so similar and so unlike what we have
today. So I think I’ll begin by working through examples and then
try to define a more abstract version.

Let’s begin by defining a new struct type to represent pixels:

var Pixel = new StructType({r: uint8, g: uint8,
b: uint8, a: uint8});

Today, if we wanted an array of pixels, we’d have to create a new type
to represent that array (new ArrayType(Pixel)). Under the new
system, each type would instead come “pre-equipped” with a
corresponding array type, which can be used to create both single and
multidimensional arrays. This type is accessible under the property
Array. For example, here I create three objects:

The first object, pixel, represents just a single pixel. Its type is
simply Pixel. The second object, row, repesents a single
dimensional array of 1024 pixels. I denote this using the following
notation [Pixel : 1024]. The third object, image, represents a
two-dimensional array of 1024x768 pixels, which I denote as
[Pixel : 1024 x 768].

No matter what dimensions they have, all arrays are associated
with a single type object. In other words:

objectType(row) === objectType(image) === Pixel.Array

This implies that they share the same prototype as well:

row.__proto__ === image.__proto__ === Pixel.Array.prototype

Whenever you have an instance of an array, such as row or image,
you can access the elements of the array as you would expect:

The variables flat, three, and image all represent pointers into
the same underlying data buffer, but with different underlying
dimensions.

Sometimes it is useful to embed an array into a struct. For example,
imagine a type Gradient that embeds two pixel colors:

var Gradient = new StructType({from: Pixel, to: Pixel})

Rather than having two fields, it might be convenient to express this
type using an array of length 2 instead. In the old system, we would
have used a fixed-length array type for this purpose. In the new system,
we invoke the method dim, which produces a dimensioned type:

var Gradient = new StructType({colors: Pixel.dim(2)})

Dimensioned types are very similar to the older fixed-length array
types, except that they are not themselves types. They can only be
used as the specification for a field type. When a dimensioned field is
reference, the result is an instance of the corresponding array:

More abstract description

The type T of an typed object can be defined using the following grammar:

T = S | [S : D]
S = scalar | C
D = N | D x N

Here S is what I call a single type. It can either be a scalar
type – like int32, float64, etc. – or a struct, denoted C (to
represent the fact that struct types are defined nominally).

UPDATE: This section has confused a few people. I meant for T to
represent the type of an instance, and hence it includes the specific
dimensions. There would only be on type object for all arrays, so if
we defined a U to represent the set of type objects, it would be S
| [S]. But when you instantiate an array [S] you give it a concrete
dimension. I realize that this is a bit of a confused notion of type,
where I am intermingling the “static” state (“this is an array type”)
and the dynamic portion (“the precise dimensions”). Of course, in this
language we’re defining types dynamically, so the analogy is imprecise
anyway.

For each struct C, there is a struct type definition R is defined
as follows:

R = struct C { (f: T) ... }

Here C is the name of the struct, f is a field name, and T is
the (possibly dimensioned) type of the field.

This description is kind of formal-ish, and it may not be obvious how
to map it to the examples I gave above. Each time a new StructType
instance is created, that instance corresponds to a distinct struct
name C. When a new array instance like image is created, its type
corresponds to [Pixel : 1024 x 768]. This grammar reflects the fact
that struct types are nominal, meaning that the type is tied to a
specific struct type object, but array types are structural – given
the element type and dimensions, we can construct an array type.

Why make this change?

As time goes by we’ve encountered more and more scenarios where the
nominal nature of array types is awkward. The problem is that it seems
very natural to be able to create an array type given the type of the
elements and some dimensions. But in today’s system, because those
array types are distinct objects, creating a new array type is both
heavyweight and has significance, since the array type has a new
prototype.

There are a number of examples from the PJS APIs. In fact, we already
did an extensive redesign to accommodate nominal array
types already. But let me give you instead an example of some code
that is hard to write in today’s system, and which becomes much easier
in the system I described above.

Intel has been developing some examples that employ PJS APIs to do
transforms on images taken from the camera. Those APIs define
a number of filters that are applied (or not applied) as the user
selects. Each filter is just a function that is supplied with some
information about the incoming image as well as the window size and so
on. For example, the filter for detecting faces looks like
this:

I won’t go into the details of how the filter works. For our purposes,
it suffices to say that isskin_parallel computes a (two-dimensional)
array skin that contains, for each pixel, an indicator of whether
the pixel represents “skin” or not. The row_sums computation then
iterates over each row in the image and computes a sum of how many
pixels in that row contain skin. col_sums is similar except that the
value is computed for each column in the image.

Let’s take a closer look at the row_sums computation:

var row_sums = uint32.array(h).buildPar(i => ...);
// ^~~~~~~~~~~~~~~

What I am highlighting here is that this computation begins by
defining a new array type of length h. This is natural because the
height of the image can (potentially) change as the user resizes the
window. This means that we are defining new array types for every
frame.

This is bad for a number of reasons:

It’s inefficient. Creating an array type involves some overhead in the
engine, and if nothing else it means creating two or three objects.

If we wanted to install methods on arrays or something like that,
creating new array types all the time is problematic, since each will
have a distinct prototype.

This pattern seems to come up a lot. Basically, it’s useful to be able
to create arrays when you know the element type and the dimensions,
and right now that means creating new array types.

What do we lose?

Nominal array types can be useful. They allow people to create local
types and attach methods. For example, if some library defines a struct
type Pixel, then (today) some other library could define:

I’m not too worried about this though. You can create wrapper types at
various levels. Most languages I can think of – virtually all – use
a structural approach rather than nominal for arrays (although the
situations are not directly analogous). I think there is a reason for
that.

Is there a compromise?

I suppose we could allow array types to be explicitly instantiated but
keep the other aspects of this approach. This permits users to define
methods and so on. However, it also means that it is not enough to
have the element type and dimension to construct an array instance,
one must instead pass in the array type and dimension.

]]>2013-12-02T12:32:00-05:00http://smallcultfollowing.com/babysteps/blog/2013/12/02/thoughts-on-dst-4Over the Thanksgiving break I’ve been devoting a lot of time to
thinking about DST and Rust’s approach to vector and object types. As
before, this is very much still churning in my mind so I’m just going
to toss out some semi-structured thoughts.

Brief recap

Dynamically sized types (DST). In Part 1 of the series, I
sketched out how “Dynamically Sized Types” might work. In that scheme,
[T] is interpreted as an existential type like
exists N. [T, ..N], and Trait is interpreted as exists
T:Trait. T. The type system ensures that DSTs always appear behind
one of the builtin pointer types, and those pointer types become fat
pointers:

Advantage. Impls for objects and vectors work really well.

Disadvantage. Hard to square with user-defined smart pointers
like RC<[int]>. The problem is worse than I presented in that
post, I’ll elaborate a bit more.

Statically sized types (SST). In Part 2 of the series, I
sketched out an alternative scheme that I later dubbed “Statically
Sized Types”. In this scheme, in some ways similar to today, [T]
and Trait are not themselves types, but rather shorthands for
existential types where the exists qualifier is moved outside the
smart pointer. For example, ~[T] becomes exists N. ~[T, ..N]. The
scheme does not involve fat pointers; rather, the existential type
carries the length, and the thin pointer is embedded within the
existential type.

Advantage. It is easy to create a type like RC<[int]>
from an existing RC<[int, ..N]> (and, similarly, an
RC<Trait> from an existing RC<T>).

Disadvantage. Incompatible with monomorphization except via
virtual calls. I described part of the problem in
Part 3 of the series. I’ll elaborate a bit more here.

Where does that leave us?

So, basically, we are left with two flawed schemes. In this post I just
want to elaborate on some of the thoughts I had over Thanksgiving.
Roughly speaking they are three:

DST and smart pointer interaction is even less smooth than I thought,
but workable for RC at least.

SSTs, vectors, and smart pointers are just plain unworkable.

SSTs, objects, and smart pointers work out reasonable well.

At the end, I suggest two plausible solutions that seem workable to me at this
point:

Making DST work with RC requires some contortions

In part 1, I gave the example of how we could adapt an RC
type to use smart pointers. I defined the RC type as followings:

struct RC<T> {
priv data: *T,
priv ref_count: uint,
}

Unfortunately, as Partick pointed out on reddit, this
simply doesn’t work. The ref count needs to be shared amongst
all clones of the RC pointer. Embarassing. Anyway, the correct definition
for RC is more like the following:

In order to be sure that I’m not forgetting details, permit me to
sketch out roughly how an RC implementation would look in actual
code. To start, here is the code to allocate a new RC pointer,
based on an initial value. I’m going to allocate the memory using a
direct call to malloc, both so as to express the “maximally
customized” case and because this will be necessary later on.

OK, everything seems reasonable. Only one problem – this whole scheme
is incompatible with DST! To see why, consider again the type
RCData:

struct RCData<T> {
priv ref_count: uint,
priv t: T,
}

And, as you can see here, it references T by itself, without using
any kind of pointer indirection. But for T to be unsized, it must
always appear behind a *T or something similar. This is precisely
the example that I showed in the section
Limitation: DSTs much appear behind a pointer in
Part 1.

Now, it turns out we could rewrite RC to make it DST compatible.
The idea is to use the standard trick of storing the reference count
at a negative offset. Let’s write up an RC1 type that shows what I
mean:

In this scheme, we have a pointer data directly to a *mut T. This
means that the compiler could “coerce” an RC1<[int, ..3]> into a
RC1<[int]> by expanding data into a fat pointer. It does have the
side-effect of makeing the code to allocate an RC and manipulate its
ref count a bit more complex, since more pointer arithmetic is
involved.

Here is the code to allocate an RC1 instance. Hopefully it’s fairly
clear. One interesting aspect is that, for allocation, we don’t need
to accept unsized types T, since at allocation time the full type
is known. However, later on, we may “forget” the precise type of T
and convert it into an unsized, existential type like [U] or
Trait. In that case, we still need to be able to find the reference
count, even without knowing the size or alignment of T. Therefore,
we must be conservative and do our calculations based on the maximal
possible alignment requirement for the platform.

OK, so we can see that DST does permit RC<[int]>, but only
barely. It makes me nervous. Is this a general enough solution to
scale to future smart pointers? It’s certainly not universal.

Why SST just doesn’t work with vector types.

The SST approach does not employ fat pointers in the same sense and
thus is largely free of the limitations on smart pointer layout that
DST imposes. But not entirely. In part 3 I described the
problem of finding the correct monomorphized instance of deref().
In general, this is not possible, though in many instances the
compiler could deduce that it doesn’t matter which type of pointee
deref() is specialized to – I thus proposed that a solution might
lie in formalizing this idea by permitting a type parameter T to be
labeled erased, which would cause the compiler to guarantee that the
generated code will be identical no matter what type T is
instantiated with. This seems nice, but there are many complications
in practice. Let me sketch them out.

First, it is rare that a type can be entirely erased, even in
dereference routines. For example, consider the straight-forward RC
type that I sketched out before, where the header was made explicit in
the representation, rather than being stored at a negative offset. Here is
the Deref routine:

At first, it appears that the precise type T is irrelevant, but in
fact we must know its alignment to compute the offset of the field
t. This precise situation is why the alternative scheme RC1 made
conservative assumptions about the alignment of t. We could address
this, though, by manually annotating the alignment of the t field
(something we do not yet support, but ought to in any case):

A deeper problem lies with the drop routine. The destructor for an
RC<T> needs to do three things, and in a particular order:

Decrement ref count, returning if it is not yet zero.

Drop the value of T that we encapsulate.

Drop the memory we allocated.

The tricky part is that step 2 requires knowledge of T. I thought at
first we might be able to finesse this problem by having the
destructor run after the contained data had been freed, but that
doesn’t work because in this case the data is found at the other end
of an unsafe pointer, and the compiler doesn’t traverse that – and
worse, we don’t always want to free the T value of an RC<T>,
only if the ref count is zero.

Despite all the problems with Drop, it’s possible to imagine that we
define some super hacky custom drop protocol for smart pointers that
makes this work. But that’s not enough. There are other operations
that make sense for RC<[T]> types beyond indexing, and they have the
same problems. For example, perhaps I’d like to compare two values
of type RC<[T]> for equality:

fn foo(x: RC<[int]>, y: RC<[int]>) {
if x == y { ... }
}

This seems reasonable, but we immediately hit the same problem: what
Eq implementation should we use? Can Eq be defined in an “erased”
way? Let’s not forget that Eq is currently defined only between
instances of equal type. This winds up being basically the same
problem as drop – we can only circumvent it by adding a bunch of
specialized logic for comparing existential types.

Another problem lies in the case where the length of a vector is not
statically known. The underlying assumption of all this work is that
a type like ~[T] corresponds to a vector whose length was once
statically known but has been forgotten. We were going to move the
“dynamic length” case to a type like Vec<T>, that supports push()
and so on. But the idea was that Vec<T> should be convertible to
a ~[T] – frozen, if you will – once we were doing building it.
And that doesn’t work at all.

Finally, even if we could, we don’t want to generate those
monomorphized variants anyhow. Even if we could overcome all the
above challenges, it’s still silly to have a type like RC<[int]>
delegate to some specific destructor for [int, ..N] for whatever
length N it happens to be. That implies we’re generating code for
every length o the vector that occurs in practice. Not good, and DST
wouldn’t have this problem.

OK, so I hope I’ve convinced you that SST and vector types just do
not mix.

Why SST could work for object types.

You’ll note I was careful not to toss out the baby with the bathwater.
Although SST doesn’t work well with vector types, I think it still has
potential for object types. There are a couple of crucial
differences here:

With object types, we carry a vtable, permitting us to make crucial
operations – like drop – virtual calls.

Object types like RC<Trait> support a much more limited set of operations:

drop;

invoke methods offered by Trait.

There are many ways we could make RC<Trait> work. Here is one
possible scheme that is maximally flexible and does not require the
notion of erased type parameters. When you cast an RC<T> to an
RC<Trait>, we pair it with a vtable. This vtable contains an entry
for drop and an entry for each of the methods in Trait. These
entries are setup to take an RC<T> as input and to handle the
dereferencing etc themselves, delegating to a monomorphic variant
specialized to T. Let me explain by example. First let’s create a
simple trait:

Thus, when we convert a RC<PC> to a RC<Player>, we would pair the
RC pointer with a vtable consisting of RC_PC_drop and
RC_PC_hit_points. There are some minor complications to work out
around the various self pointer types, but that seems relatively
straightforward (famous last words). Anyway, the key idea here is to
specialize the vtable routines to the smart pointer type, by moving
the required deref into the generated method itself. This avoids the
need for us to ever invoke code in an erased fashion.

If we added the erased keyword, it could still be used to permit the
reuse of these adaptor methods across distinct pointer types. But this
can also be done without a special keyword as an optimization (unlike
before, it’s not necessary for the type to be erased, merely
helpful).

Squaring the circle

I think we could maybe make DST work, but I still worry it is too
magical. It has some real advantages though so perhaps the right thing
is to try and elaborate more examples of smart pointer types we
anticipate and see whether they can be made to work.

Another solution is to remove vectors from the language,
treat them like any other container, and use the SST approach
for object types. But there are lots of micro-decisions to be made
there, many of which boil down to usability things. For example, what
is the meaning of the literal syntax and so on? I’ll leave those
thoughts for another day.

]]>2013-11-27T15:06:00-05:00http://smallcultfollowing.com/babysteps/blog/2013/11/27/thoughts-on-dst-3After posting part 2 of my DST series, I realized that I had
focusing too much on the pure “type system” aspect and ignoring some
of the more…mundane semantics, and in particular the impact of
monomorphization. I realize now that – without some further changes
– we would not be able to compile and execute the second proposal
(which I will dub statically sized typed (SST) from here on
out). Let me first explain the problem and then show how my first
thoughts on how it might be addressed.

The problem

The problem with the SST solution becomes apparent when you think
about how you would compile a dereference *rc of a value rc that
has type exists N. RC<[int, ..N]> (written long-hand). Typing this
dereference is relatively straightforward, but when you think about
the actual code that we generate, things get more complicated.

The problem here is that the way monomorphization currently works,
there will be a different impl generated for RC<[int, ..2]> and
RC<[int, ..3]> and RC<[int, ..4]> and so on. So if we actually try
to generate code, we’ll need to know which of those versions of deref
we ought to call. But all we know that we have a RC<[int, ..N]> for
some unknown N, which is not enough information. What’s frustrating
of course is that it doesn’t actually matter which version we call
– they all generate precisely the same code, and in fact they would
generate the same code regardless of the type T. In some cases, as
an optimization, LLVM or the backend might even collapse these
functions into one, since the code is identical, but we have no way at
present to guarantee that it would do so or to ensure that the
generated code is identical.

A solution

One possible solution for this would be to permit users to mark type
parameters as erased. If a type parameter T is marked erased, the
compiler would enforce distinctions that guarantee that the generated
code will be the same no matter what type T is bound to. This in
turn means the code generator can guarantee that there will only be a
single copy of any function parameterized over T (presuming of
course that the function is not parameterized over other, non-erased
type parameters).

If we apply this notion, then we might rewrite our Deref
implementation for RC as follows:

It would be illegal to perform the following actions on an erased parameter T:

Drop a value of type T – that would require that we know what type T is
so we can call the appropriate destructor.

Assign to an lvalue of type T – that would require dropping the previous
value

Invoke methods on values of type T – in other words, erased parameters can
have no bounds.

Take an argument of type T or have a local variable of type T – that would
require knowing how much space to allocate on the stack

Probably a few other things.

But maybe that erases too much…?

For the most part those restrictions are ok, but one in particular
kind of sticks in my craw: how can we handle drops? For example,
imagine we have a a value like RC<[~int]>. If this gets dropped,
then we’ll need to recursively free all of the ~int values that are
contained in the vector. Presumably this is handled by having RC<T>
invoking the appropriate “drop glue” (Rust-ese for destructor) for its
type T – but if T is erased, we can’t know which drop glue to
run. And if T is not erased, then when RC<[~int]> is dropped,
we won’t know whether to run the destructor for RC<[~int, ..5]> or
RC<[~int, ..6]> etc. And – of course – it’s wildly wasteful to have
distinct destructors for each possible length of an array.

Erased is the new unsized?

This erased annotation should of course remind you of the unsized
annotation in DST. The two are very similar: they guarantee that the
compiler can generate code even in ignorance of the precise
characteristics of the type in question. The difference is that, with
unsized, the compile was still generating code specific to each
distinct instantiation of the parameter T, it’s just that one valid
instantiation would be an unsized type [U] (that is, exists
N. [U, ..N]). The compiler knew it could always find the length for
any instance of [U] and thus could generate drop glue and so on.

So perhaps the solution is not to have erased, which says “code
generation knows nothing about T, but rather some sort of partial
erasure (similar to the way that we erase lifetimes from types at code
generation, and thus can’t the code generator can’t distinguish the
lifetimes of two borrowed pointers).

Conclusion

This naturally throws a wrench in the works. I still lean towards the
SST approach, but we’ll have to find the correct variation on erased
that preserves enough type info to run destructors but not so much as
to require distinct copies of the same function for every distinct
vector length. And it seems clear that we don’t get SST “for free”
with no annotation burden at all on smart pointer implementors. As a
positive, having a smarter story about type erasure will help cut down
on code duplication caused by monomorphization.

UPDATE: I realize what I’m writing here isn’t enough. To actually
drop a value of existential type, we’ll need to make use of the
dynamic info – i.e., the length of the vector, or the vtable for the
object. So it’s not enough to say that the type parameter is erased
during drop – or rather drop can’t possibly work with the type
parameter being erased. However, what is somewhat helpful is that
user-defined drops are always a “shallow” drop. In other words, it’s
the compiler’s job (typically) to drop the fields of an object. And
the compiler knows the length of the array etc. In any case, I thnk
with some effort, we can make this work, but it’s not as simple as
erasing type parameters – we have to be able to tweak the drop
protocol, or perhaps convert “partially erased” type parameters into a
dynamic value (that would be the length, vtable, or just () for
non-existential types) that can be used to permit calls to drop and so
on.