mutable ast?

On 11-03-15 10:29 AM, Rafael Avila de Espindola wrote:

> When handling
>
> fn foo() {
> fn zed(bar z) {
> }
> tag bar {
> nil;
> }
> fn baz() {
> zed(nil);
> }
> }
>
> we currently do two passes in trans.rs. The first pass collects the tags
> so that they can be used when collecting zed.
>
> One way to avoid this is to have ty_tag point to the tag item. The
> problem if we do that is that we get a cycle:
>
>
> item_tag -> vec[variant]
> variant -> ann
> ann -> middle.ty.t (ty_tag)
> middle.ty.t -> item_tag
>
> so we would have to make one of the links mutable :-(
>
> Do you guys think it is worth it? Which link do you prefer? Maybe you
> have another idea to avoid the multiple passes without introducing the
> cycle?

We discussed this on IRC, but I'll follow up here to summarize (it seems
our IRC network will not be logged any time soon; I'm working on
figuring out whether that can be made so).

My "short answer" was that I'm not interested in making the ast or ty.t
types mutable+cyclic, because it introduces a lot of potential
difficulty elsewhere (moves it to the gc layer, possibly undermines
parallelism, makes other code have to rebuild direct pointers properly,
makes folds complicated-to-impossible, depending on link).

We then had a longer discussion about whether there were any possible
alternatives, in particular trying to devise ways of addressing
potential performance costs of (a) constantly rebuilding the tree and
(b) having to keep it acyclic due to immutability.

For (a) I suggested a visitor for each node but does *not* rebuild, but
looks and possibly builds worklists for interesting subsets of the AST.
Somewhat like fold; in fact fold originally was built with the ability
to function this way; but it involved taking some 20-odd type parameters
to unify the roles and was quite cumbersome. Probably just a separate
module that walks an obj type through a tree will be sufficient.[1]

For (b) I had no ideas, but espindola suggested that a gc root could be
floated out to the maximum acyclic node in a constant structure;
pcwalton pointed out that the freeze/thaw scheme we've been discussing
for the future may well be able to integrate this with a sort of managed
weak pointer for the interior nodes, which immediately made me think of
the const refcount, and the similarity this idea has with the ideas
we've been pushing around for handling single-writer / multi-reader
multithreading as an optimized pattern for large frozen structures.

Ultimately we agreed (I think!) to defer the concept to when we're
implementing 'freeze', with the understanding that it would be very
desirable to be able to freeze not just mutables but *cyclic* mutables
into an immutable structure. So possibly we wind up with the output from
freeze being *const*, with only the root being refcounted-immutable. Or
something along these lines.

Anyway, that's a summary of how I saw the conversation unfold; correct
anything you see as wrong in this summary. It got a little heated and I
don't want to misrepresent anyone's opinion.

-Graydon

[1] Fold should really be an obj eventually; we just didn't have
obj-extension working in rustboot at the time

mutable ast?

> Anyway, that's a summary of how I saw the conversation unfold; correct
> anything you see as wrong in this summary. It got a little heated and I
> don't want to misrepresent anyone's opinion.

I think you just missed one of the points why I wanted to introduce a
cycle. The structure is semantically cyclic. We just represent one of
the edges with "these two integers are the same". Having a real pointer
makes it easier to understand.

This also got to the question of how to control what can mutate a data
structure. In clang (which is written is C++ and compiles C++), it is
really important that random parts of the FE don't mutate the AST since
changes to it can invalidate things like C++'s crazy name lookup rules.
To have that and at the same time allow for cycles, the AST is a friend
of the parts that construct it. Which is ugly, but does allow them to be
sure about which parts of the code can mutate the AST.

Sorry for bringing even harder questions, but this also got me thinking
about one of the nice features of C/C++: the ability to represent not
only cycles, but arbitrary memory layout. Is there a plan to implement
something similar in rust?

For example, if I implement a JS interpreter or JIT in C/C++, I can give
its data structures any layout that is appropriate for the JS
implementation. If, for example, a garbage collector is used for the JS
objects they don't need a reference count, but I can still get a pointer
to it from the C++ code.

I am sure this is not safe in general, but it would be sad to have to go
back to C/C++ for any unsafe operation.

An example that is probably closer to the issues we will have after
bootstrap: can rust's garbage collector be written in rust?

> -Graydon
>
> [1] Fold should really be an obj eventually; we just didn't have
> obj-extension working in rustboot at the time

mutable ast?

On Tue, Mar 15, 2011 at 2:17 PM, Rafael Avila de Espindola
<respindola at mozilla.com> wrote:
> Sorry for bringing even harder questions, but this also got me thinking
> about one of the nice features of C/C++: the ability to represent not only
> cycles, but arbitrary memory layout. Is there a plan to implement something
> similar in rust?

There's no reason arbitrary memory layout can't be done safely -- not
to toot my own group's horn here, but Habit[0][1][2] has features to
look at if you're interested in that.

>
> For example, if I implement a JS interpreter or JIT in C/C++, I can give its
> data structures any layout that is appropriate for the JS implementation.
> If, for example, a garbage collector is used for the JS objects they don't
> need a reference count, but I can still get a pointer to it from the C++
> code.
>
> I am sure this is not safe in general, but it would be sad to have to go
> back to C/C++ for any unsafe operation.
>
> An example that is probably closer to the issues we will have after
> bootstrap: can rust's garbage collector be written in rust?
>

In general, you can't implement a garbage collector in a typed
language, because re-using memory for a value of a different type is
inherently unsafe. There are fancy type systems that can get around
that, but probably nothing that's likely to be in Rust... unless the
typestate system could be used for that?

mutable ast?

On 11-03-15 05:24 PM, Tim Chevalier wrote:

> On Tue, Mar 15, 2011 at 2:17 PM, Rafael Avila de Espindola
> <respindola at mozilla.com> wrote:
>> Sorry for bringing even harder questions, but this also got me thinking
>> about one of the nice features of C/C++: the ability to represent not only
>> cycles, but arbitrary memory layout. Is there a plan to implement something
>> similar in rust?
>
> There's no reason arbitrary memory layout can't be done safely -- not
> to toot my own group's horn here, but Habit[0][1][2] has features to
> look at if you're interested in that.
>

Well, depends on how far you go with "arbitrary". Before thinking about
the garbage collector what I had in mind was a dynamic linker. In that
case you have to apply relocations to get a valid function pointer that
is then called.

mutable ast?

On 11-03-15 02:17 PM, Rafael Avila de Espindola wrote:
>> Anyway, that's a summary of how I saw the conversation unfold; correct
>> anything you see as wrong in this summary. It got a little heated and I
>> don't want to misrepresent anyone's opinion.
>
> I think you just missed one of the points why I wanted to introduce a
> cycle. The structure is semantically cyclic. We just represent one of
> the edges with "these two integers are the same". Having a real pointer
> makes it easier to understand.

Yeah, I gather that. And you're totally right, a pointer would, in the
context you're discussing, be easier to understand. It'd just shift a
lot of cost elsewhere, a hit I'm not comfortable taking just now.

> This also got to the question of how to control what can mutate a data
> structure. In clang (which is written is C++ and compiles C++), it is
> really important that random parts of the FE don't mutate the AST since
> changes to it can invalidate things like C++'s crazy name lookup rules.
> To have that and at the same time allow for cycles, the AST is a friend
> of the parts that construct it. Which is ugly, but does allow them to be
> sure about which parts of the code can mutate the AST.

Yeah. Some freeze/thaw logic ought to be able to play in similar fields
as that. I hope.

> Sorry for bringing even harder questions,

Oh, no need to apologize; the harder questions were the more interesting
ones! ("how do we get cyclic immutables", say; I really hope the
const-blob-with-a-refcounted-root idea works!)

but this also got me thinking
> about one of the nice features of C/C++: the ability to represent not
> only cycles, but arbitrary memory layout. Is there a plan to implement
> something similar in rust?

Arbitrary is such an arbitrary word :)

I'm not sure how to answer it in general. "Represent" in what sense? A
pointer is just a number. Since we permit a modest amount of fiddling
around with raw pointers from C-land as numbers, I imagine you have
something more explicit in mind.

> I am sure this is not safe in general, but it would be sad to have to go
> back to C/C++ for any unsafe operation.

My general sense is that a small number of C/C++ operations + a majority
of connective code in rust manipulating opaque native pointers (possibly
through resources and tags that wrap them) is the best we can do. Or at
least the best we can do without totally blowing the cognitive budget of
the language (no typed assembly language and memory regions, thanks).

> An example that is probably closer to the issues we will have after
> bootstrap: can rust's garbage collector be written in rust?

Yeah, but it'll involve a number of unsafe calls. I figure with a few
unsafe primitives, the rest is mostly feasible in rust.

I wrote the one in rustboot in asm (actually in the mutant IL/x86
pseudo-asm) but that was mostly for expedience; didn't have to teach
trans to look through itself. It's actually not a very big algorithm.
You mark and you sweep. Mostly through lists. The whole thing runs on a
task's own stack so there's no need to play sneaky scheduling games.

mutable ast?

On 3/15/11 2:17 PM, Rafael Avila de Espindola wrote:
> I am sure this is not safe in general, but it would be sad to have to go
> back to C/C++ for any unsafe operation.

I think we can make do with a small number of unsafe primitives: "peek"
and "poke" to get started, and some sort of unsafe cast operation to
cast between blobs of memory and Rust records. Rust records are
fortunately laid out more or less exactly the same as C structs.

mutable ast?

> Arbitrary is such an arbitrary word :)
>
> I'm not sure how to answer it in general. "Represent" in what sense? A
> pointer is just a number. Since we permit a modest amount of fiddling
> around with raw pointers from C-land as numbers, I imagine you have
> something more explicit in mind.

Not really. Anything you can draw in a white board can be coded in C++,
sometimes in a not very elegant way. An example that comes to mind (from
llvm) is using placement new to put a string (char*, not std::string)
after a regular struct. With most other languages you would need an
extra pointer and a separate allocation.

Another example would be a mapping from string to an arbitrary value.
One datastructure that is used for that in LLVM uses a single buffer it
owns the strings. It is a lot faster than a non specialised hash table
with pointers to externally allocated strings.

I am not sure if a lot is gained by requiring users to go to C to get
this. I agree it is easier to design, but I wonder if some form of
unsafe blocks could be provided. The total safety of the program is not
decreased by requiring a combination of a safe and an unsafe language :-)

mutable ast?

> I think we can make do with a small number of unsafe primitives: "peek"
> and "poke" to get started, and some sort of unsafe cast operation to
> cast between blobs of memory and Rust records. Rust records are
> fortunately laid out more or less exactly the same as C structs.
>
> Naturally, all of these would mark the caller unsafe.

Perfect. Yes, having unsafe operations in rust itself should make it
possible to implement some of the optimised data structures used in C or
reflect on its own state (for a GC).