Archive for the ‘Mirah’ Category

In the past two weeks I fixed a bug, wrote a hacked together REPL and ported the new bootclasspath functionality to the newast branch. I also started working on fixing the newast branch’s Java source code generator support, but haven’t gotten very far with that yet.

The bug I fixed, #185, was where macros called in a class body that were defined in the class body were blowing up when generating Java source code.

The problem was that the code generator assumed it was inside a method scope when emitting the code for a scoped body. Macros produce a ScopedBody node, so when a macro was called outside a method, it blew up, because @method was nil. The ScopedBody code was making a bad assumption.

I fixed it by special casing the code to check to see whether it was in a method or not. It’s not very pretty, and I’m sure there’s a better way to do it, but it works and I wrote a test for it. The new code checks @method and optionally wraps the call to super. Ugh.

I feel like there’s got to be a way to restructure the code to make this cleaner.

Hacky REPL

I’d been meaning to try putting together a REPL for Mirah for a few months. I’ve got pages of ideas sketched out. There’s some hard problems in making a nice REPL for a statically typed language and it looked like fun. I thought I’d need to learn about bindings and readline and how IRB works, and I never got started (IRB is kind of neat by the way).

Then, last week, I thought to myself “screw all this planning, how hard would it be to just hack something together? Merb wasn’t built in a day.” All interesting software gets its start as a short, clever hack. Why not make my own? So, I hacked something together in a couple hours.

How does it work? Well, it loops over input–it’s a REPL, waiting for a complete statement and then parses and compiles it. But, it does that in a really hacky way.

First, it doesn’t print each statements value–so it’s not really a REPL, more like a REL, Read Evaluate Line.

Second, it does support multiline statements, but rather than having a nice scanner/partial parser, it just catches syntax errors and eats them if they look like an unfinished statement. Which is pretty crappy.

Finally, and most importantly, it doesn’t have a global binding–this means that you can’t do stuff like this:

because x goes out of scope after the line is evaluated. Fixing this, I think, is the really cool problem in writing a Mirah REPL.

On the other hand, you can define classes and create them, because the REPL uses a shared classloader. So there.

This week AKA plans

I’d like to fix type inference for blocks for methods that are inherited.

This Sunday, I wanted to merge master into newast again. A few bugs have been fixed since then, and I wanted to have the tests for those bugs in the newast branch, since that’s the future. I’d also cleaned up some of the tooling around running commands and wanted those changes merged in. Because I’m stubborn and hard headed I tried to actually merge master into newast, but that wasn’t a great option

There were too many conflicts. Too many. And I couldn’t tell which ones were new changes and which when just things that were different. After making a couple attempts at using a merge and trying to break up the pieces one way or another, I found the answer. _git cherry-pick_ ftw. With cherry pick I could just merge the changes that had happened since the last merge–(https://github.com/mirah/mirah/compare/9a06b834…a86c3651) and not have to worry about all the other differences between the branches.

It worked pretty nicely, I cherry picked most of the changes without problems. Some of the commits that changed code that doesn’t exist any more, or has been converted to using the visitor pattern needed some modifications, but it was really straight forward.

What I didn’t merge in: bootclasspath

The –bootclasspath flag changes touched a number of things that I didn’t want to dig into this week, so I left those out. Fortunately, those changes were well factored, so I didn’t have to change much outside the commits making up that feature. I’m planning on checking that out next week. Check out this commit for more info.

In the future

Next time I work on a bug on master, I’ll try to get both newast and master fixed–if it makes sense. Doing it that way will make sure that they don’t get too out of sync.

First let me explain: in March, Ryan Brown organized the hackathon so we could put some serious work into the newast branch–which is the future of Mirah. What’s the newast branch? It is the beginnings of moving Mirah’s compiler into Mirah, starting with the Abstract Syntax Tree. I’ve done some work with it before, but over the last two weeks I saw a bit more of how it works than before.

So far I like where it’s going. The Ruby version of the compiler used methods on the AST node classes to do transformation, inference and compilation. That’s not great for encapsulation, but is pretty convenient. Newast uses a visitor pattern(http://en.wikipedia.org/wiki/Visitor_pattern), which makes it easier to modularize the compiler. It’s also easier to add new things that walk the tree.

The goals were ambitious

* Documentation: what works, what doesn’t (and should), getting started, etc
* Test writing to fill out gaps (this can kinda go along with documentation)
* Codebase cleanup (on the new newast branch…may need to wait for
Ryan to get it working well)
* Bug tracker triage (close working bugs, try to fix simple ones, file
unreported issues from mailing list)
* Get pindah (Android framework) working (if it is not) and document
how to do it.
* Get dubious (GAE framework) working (if it is not) and document how to do it.
* Port parts of Mirah’s Ruby code into Mirah (this is a goal of newast branch)

To me it seems like the hackathon ended up mostly focusing on

* Codebase cleanup (on the new newast branch…may need to wait for
Ryan to get it working well)
* Port parts of Mirah’s Ruby code into Mirah (this is a goal of newast branch)

As well as, getting some generics support in.

Before the hackathon, newast could run tests, but not that many were passing. Quite a bit of the basic functionality worked, but there was no block support and no macro support. Also, the parser was a pain to build, because it required a special branch off of master to be checked out, and the Rakefile of the parser to be modified to point at it.

After, newast is easier to build, blocks and macros work better and there’s some generics support.

Simple blocks work and simple macros work. Not all the builtins have been added yet, but the pattern for adding more is pretty straight forward.

Also, it’s become pretty easy to get the newast branch running locally. This rocks because it’ll make it that much easier for people to contribute.

Mirah on newast has some support for generics now. Mirah can now take advantage of methods and return types that are parameterized. It can’t define generics itself yet. For more information about the generic support, checkout the wiki page (mirah wiki) on it.

Overall I think the hackathon was a success and we should totally do it again.

The saga continues. Last week I thought I’d shaved the yak, but this week I found it still very hairy. Last week, I got closures in closures to get past the inference stage, but silly me, I didn’t actually try to compile or run them. Haha. Turns out they still don’t work.

This Sunday I took another crack at creating view closures in Shatner. That’s where I discovered this little mess. Shatner is a great test bed for Mirah because it does crazy things with closures and macros. It’s also kind of irritating to debug because it does crazy things with closures and macros.

My test app looked like this:

I was trying to build closures for the view so it could share variable scope with the responding block. This particular example is pretty dumb–it doesn’t even have any variables to share. But it didn’t work and gave me the following stacktrace.

Bindings are how Mirah shares state between closures. It’s pretty neat how it works actually. The compiler determines what local variables are shared between the outer scope and the closure and creates a binding class to hold them. Then both the outer and inner scope use the binding object instead of local variables. That’s how it’s supposed to work anyway.

What Mirah was trying to do, was to ask the closure’s scope’s defining class for a binding type–a class definition for the shared binding, but the scope didn’t have a defining class. The scope of the closure was the static scope for SomeAppWithAnUnmacroedView, which didn’t have a defining_class because it’s the wrong sort of scope to have one. I think that a static scope for a class body doesn’t have a defining_class because it doesn’t belong to an instance of a class, but I’m not exactly sure.

SomeAppWithAnUnmacroedView’s scope was the wrong scope because the block had been moved to the class’s initializer by Shatner. The get macro takes the passed block and appends it into the initializer. Moving it caused a problem because Mirah currently caches the scope of a scoped node to avoid having to do look up every time. This is fine usually, but because the macro had moved the block its scope should have changed. I unmemoized the scope by changing ||= to = to see what would happen (lib/mirah/ast/scope.rb#L20) .

We’re making progress. Now the error happens in a different place. ScopedBody nodes don’t have a defining_class method. Maybe they should. I tried adding one, following the pattern I’d seen elsewhere.

And got a new stacktrace. This is great–it made it past inference and tried generating bytecode. And failed. But we’re still moving forward. Where the exception is raised, it looks like we’re trying to get a binding out of a hash, but the hash is nil (lib/mirah/jvm/compiler/jvm_bytecode.rb#L529). Going up a level, we’re asking the parent compiler for the binding with the passed name and we’re not finding it (lib/mirah/jvm/compiler/jvm_bytecode.rb#L827) . Why? The parent is another ClosureCompiler and they don’t have bindings on them.

So I added bindings to the closure compiler by calling super. The Base compiler class creates the binding hash used by most things, so I thought I just needed closures to do that too.

Hooray it compiled! And we got a new and interesting error!

The new problem is that the code being generated is invalid. Ugh. Regenerating with -j shows us that something’s wrong with how things are organized. Argh!

And that’s as far as I got this week. I learned a quite a lot more about Mirah’s closure support and lack there of and got pretty far just following the stacktraces. I think I’ll do some more digging next week.

Way back in November, four months ago, I embarked on an adventure. I wanted to render views in Shatner in a friendly way. Unfortunately, at the time Mirah didn’t support defining closures inside other closures(#155). This is a big problem for all sorts of interesting use cases.

In my particular use case, I was trying to use a closure to get around another issue with Mirah’s edb plugin, where it didn’t like being passed unquotes (#152). My thought was that I could use edb to generate a render method on a closure class that would represent the view. That way, the view would have access to the environment inside the get block through the binding object that Mirah generates as part of how it handles closure creation.

The problem was that blocks that represent closures didn’t have the right kind of back reference to the ClosureDefinition that was added to them by the transformer. When the transformer generated the ClosureDefinition, it didn’t tell the block about it. Because the block didn’t know, it couldn’t tell closures inside it what scope they were in.

A ClosureDefinition node is an AST node representing the class definition of a closure. The code generator uses these nodes, along with their attached blocks to make those Class$1234.class files you see when you have closures in Java.

An Example

During the transform phase, the transformer adds a ClosureDefinition node for Block 1 that uses the outer class for it’s scope. It creates a constructor for the new ClosureDefinition. The constructor takes a binding that it will share with the enclosing scope.

Then the transformer looks at the body of Block 1, to use it to create method definitions on the ClosureDefinition for the abstract methods of the type that the Block is implementing. For Block 1, that’s run.

It gets to Block 2, and tries to create a ClosureDefinition for it. But that fails, because Block 1 doesn’t know its own type. It doesn’t have a reference back to the its ClosureDefinition.

The Fix

The error you would get with this was "undefined method `defining_class' for #<Mirah::AST::Block:0x2f267610>". This was because most scoped body types had a defining_class method that pointed at the AST node representing the class they were defined on. Blocks didn’t. The way I fixed it was by adding a defining_class method on Block and initializing the instance variable it pointed to with the ClosureDefinition created during the transform step.

Plans

This weekend I’m hoping to either get back to working on Shatner, or to start working on a REPL for Mirah. I’ve already learned a little bit about how Mirah’s binding generation works–to build a REPL, I’d need to master it.

New AST

Also, a few weeks ago I finally merged master into the newast branch. The newast branch is where ribrdb has been working on making Mirah more self-hosted, which should make it much faster, as well as providing a good place to work out some of the edge cases in the language. Merging master is important because otherwise the more experimental branch will diverge too much & become harder to merge back in later.

This week I wanted to deal with one embarrassing bug, and some that looked fun. After looking through everything last week, I felt like I got a better sense about the current issues. So, I picked a few issues to work on that I thought I could dig into and get something working in an afternoon. I think I was a little ambitious.

First I wanted to fix #161mirah -v causing an error because, frankly it’s embarrassing. I also thought it’d be pretty easy. It was, but the fix wasn’t as clean as I’d like.

Next time I’m in the option handling code, I’m going to do some house cleaning–it’s getting a bit ugly. For instance, parser should not take a list of commandline args and have to worry about ‘-e’ etc. How to reorganize it I haven’t decided. Maybe using optparse, though that has its own idiosyncrasies.

I fixed this by adding an intrinsic to the jvm backend code. It’s not especially pretty. The intrinsics code is tied up in a few files and some of them feel like balls of mud. It’s not readily apparent where to find things and what they do.

I’ve been thinking about how to make intrinsics nicer and more consistent. It might be nice to have some common types with a common set of expected method definitions that the various (hypothetical) backends should implement. That way you’d have some consistency across different Mirah backends.

I’d also like to reorganize the intrinsics and make their internal APIs easier to grok. Things I’d like to do with them, like allowing easy method aliasing are hard right now, because the API has a lot of sharp edge cases.

Test cleanup

After fixing #13, I renamed all the test files. I was getting frustrated w/ having them all be prefixed with test_. That made tabbing into the right test file take a couple more keystrokes. It added just enough friction that I wanted to change it. So I did.

This one is definitely not a single Sunday afternoon project. And, honestly I didn’t expect it to be. I spent about an hour trying to figure out what would need to change to support creating constants. It looks kind of annoying.

First, you need to transform the mmeta AST nodes into a Mirah AST node for the Const Assign, which could be as simple as creating a static FieldAssign. Then you need to change the code generation to deal with that. Unfortunately, FieldAssign’s don’t currently know about access levels (public, private, protected) which means you’d either have to add access levels to field assign, or create a new AST class that would have to be dealt with in the code generation phase.

Thinking about it a little, it might be nicer to do the second thing. Then, different (hypothetical) backends could handle access levels for constants their own way. That might be handy, particularly if constants are special in different ways in different languages.

The other thing I looked at briefly was adding ++ to the grammar. This turned out to be rather hard looking because the parser’s master branch is tied to mirah’s newast branch, which has a lot of new things in it and doesn’t work with the current release yet.

I’m debating creating a new branch on the parser at a point before it started using the new AST. On the other hand, it might make sense to spend more time trying to update the newast branch so that it is up to date with master. Then I could get it closer to merging back in.

This week I decided to go and read all of Mirah’s open issues starting with the earliest submitted ones. I’d been spending so much time just looking at the top of the list. I didn’t have a good sense for which ones were duplicates, which were pretty undefined and which were easy. Thankfully, Mirah doesn’t have that many issues, so attempting to read through them all wasn’t a ridiculous undertaking. There were only 64, and I had filed about a half dozen of them myself.

In the end, I was able to close 7, and get a good sense of where a number of the other ones stood in terms of how much work it’ll take to fix them. Of course there were a few that made me confused, where I don’t have any idea where to start digging to fix them.

After going through the stack of issues, I have a better sense of what I want to try to tackle next week. Here’s a few that I think look fun.

++ isn’t in the grammar yet, so this would require learning more about the parser. And, I’d get to play around with the grammar, which is always fun, and something I haven’t done much of since graduating.

Currently Mirah uses Java’s ==, which checks identity not equality. We’d like to use Java’s equals as our ==, but I’ve had a little trouble getting it working properly. Still, this is a fun one to hack on.

This looks fun. I’m not quite sure how to make it work. The problem is that you want blocks to work like they do in Ruby, where self is the owner of the outer scope. Currently, Mirah’s bindings don’t do that, so self is the instance of the closure. Looking at this got me thinking about non local return(NLR). There was an interesting discussion about it on Twitter today with @evanphx saying

I think it’d be really interesting to try to add NLR to Mirah’s blocks. It might be crazy, but it could also be awesome.

#69 – Mirah doesn’t check blocks signature against the signature of the method they’re supposed to be implementing.

This week I looked at doing views in Shatner again. I had thought of a new approach to get around def_edb‘s issues with unquotes. That quickly devolved into an exploration of a part of the inference engine I hadn’t played with yet.

I’d been looking for a good way to share state between views and the ShatnerApp. Initially, my plan was to just have the view be a method on the Shatner app, just like how Dubious currently does it. But, that’s pretty pedestrian and totally wouldn’t cause things to blow up. Blowing things up in Mirah is one of the reasons I work on Shatner.

I thought maybe I could build view objects as closures. That way they’d have references to all the variables defined in the outer scope, but they could still be separate classes and have separate methods ala Rails’ view helpers.

And maybe it’d be easier to build a macro that just modifies the code in place to implement something like (gist):

What I discovered was that Mirah currently doesn’t support closures being defined in closures (issue #155). They don’t work because they can’t figure out what the outer scope is.

When a closure is created in Mirah, it begins life as a Mirah::AST::Block. However, it doesn’t stay just a block for long. After the parse phase is over, it is transformed.

The typer figures out what method is being called with the block, and figures out what the block’s interface is. It generates AST nodes for a class that implements that interface and dumps the class into the tree of the outer class. It then adds AST nodes at the call site for creating a new instance of that freshly minted closure class and carries on typing.

The problem is that the AST nodes inside the block still think their parent is the block. This is only a problem when you have another block inside a block because a Mirah::AST::Block doesn’t have a class associated with it, so the closure generated inside the closure doesn’t know what contains it.

I still haven’t quite figured out how to resolve this. I feel like there’s probably some AST munging needed to get it to work properly, but I’m not sure what operations need to be taken.

It’s a very interesting problem though.

The other thing I looked at was getting == to use equals. Earlier, I’d tried copying String#+ doing something like (gist):

Which worked for ==, but not for !=, because I couldn’t figure out how to negate the call.

So, I tried the method used by kind_of?, which creates some AST nodes and shoves them back into the tree. That worked for the Java source generator, but not the Bytecode generator. Which was kind of odd.

Clearly I don’t know enough about how the bytecode generator works. Maybe I’ll look into that some next week.

This week I decided to focus just on bugs. I picked a few that looked interesting and set about investigating.

I decided to look at #108, #150, #151, #68 and #52.

108 I realized, I hadn’t thought enough about. It’s really a design thing. Do you want to require that your closures match signatures with the methods the implement? What if you don’t care about the arguments because everything you want to do has to do with the closed scope and not with what you’re passed? I could see it both ways. On the one hand, requiring a matching signature ensures all the types and whatnot are correct. On the other hand, if you aren’t using the passed arguments, they are just boilerplate code.

I think we should allow closures to have no arguments where the method they are implementing has them for niceness. But it should be an all or nothing thing. Either no parameters, or parameters matching the signature of the method you expect to be implementing.

#52

I spent most of my time on this one. It seemed straight forward at first, but had a number of tricky bits that took some working out. The problem was that if you had a begin rescue end block, and treated it as an expression if the result types of the block itself and the rescue clauses differed, Mirah wouldn’t raise an exception, but the JVM would. For example (gist):

foo either is an int or a String, which would be fine in Ruby–but not in Mirah because Mirah has static typing.

So, I set about fixing it. The first thing was to define the valid behavior. What I came up with was this:

if a begin rescue block is treated as an expression, check the compatibility of its result types

if it’s not an expression, don’t check the result types

The second one is nicer because it lets you not care when you don’t want to care. For example (gist):

If we checked result types in both expressions and statements, the above would fail to compile, which would suck.

I started by writing a couple tests of the expected behavior. I wanted to get an inference error when I used the rescue as an expression, but not as a statement.

Then I went looking for other, similar pieces of functionality that I could use for a guide. I found that if elses also follow this pattern, so I wanted to look at how If AST nodes were inferred, trying to get a handle on how I would change rescues. I grepped through lib/ for If’s infer method and found it in lib/mirah/ast/flow.rb. What it does is get the if body’s result type, and the else’s result type and check to see if they are compatible (gist).

Rescue inference was a little more complicated than If because rescue’s have a lot more pieces. First, you have the body between the begin and the first rescue. Then, there can be multiple rescue clauses, each handling a different set of exceptions. Finally, if there’s an else, it runs after the body if there is no exception. There’s also ensure which is run after, but doesn’t return anything, so it shouldn’t affect result type comparisons (need to write a test for that …).

So, the types that need to be compatible are the that of the body if there isn’t an else or the else’s and all the rescue clauses. Kind of complicated. But pretty doable. I modified the Rescue node’s infer method to test compatibility of the body or the else and all the clauses, which worked for the tests I wrote but broke other tests.

It broke other tests because of how deferred inference works. When inference is reattempted, the typer assumes that the deferred node was evaluated as an expression, which breaks when it is a Rescue because its inference depends on how it was used.

So as a work around, I added an ivar to the Rescue node to cache how it was initially inferred (lib/mirah/flow.rb).

It’s a hack, but it resolved the issue. I’d like to look into how to change things to make expression vs statementness be something defined in the AST instead of decided at inference time, or make it more explicit that the expression-ness is determined by the parent node during inference.

This week I wanted to get back to my pet Sinatra clone–Shatner. Last time I tried working on it, I was trying to figure out how to get views working like they do in Ruby. I want to end up with a dsl like this(gist):

Where you can call edb :view_name just like you call erb :view_name in Sinatra, and it figures out what you want.

I tried to do this by injecting a call to the def_edb macro into the class’s AST node. That didn’t work. Because def_edb was getting expanded before the outer macro, instead of passing it a proper AST node representing a method name, I was passing it an Unquote, which it didn’t know what to do with.

This looked like a rather difficult thing to untangle, so I gave up on it for now and file a bug (#152).

Somewhere the AST node for the body of the block was being set to nil. This caused some later code to blow up when it attempted to iterate over the block’s children. I fixed it by ensuring that the body of a block is always set to some value.

I looked at some pretty interesting bits of the compiler while fixing this. Did you know that Mirah’s block support works in two ways? You can use it like you would in JRuby, and pass a block in place of an interface implementation, which is cool. And, you can also pass it a block that contains the definitions of the methods you want to implement. For example (gist)

#108 closure generation

I was able to reproduce this, and figure out roughly what’s going on, but haven’t fixed it yet. The problem is that we’re not generating the signature of the method a block passed as an interface correctly all the time. When the block has no arguments, but the method it’s supposed to implement does, Mirah generates a method with the same name but no arguments. Problem.

#109 incorrectly attaching macros to the wrong node type

I reproduced this but didn’t look into it. It’s interesting it only happens when the macro is used somewhere else in the code.

#123 ICE from methods that have same name as previously defined macro

The problem here was that a macro’s return type is InlineCode, and the MethodDefinition node didn’t know how to validate it’s own signature against a macro with the same name. I put in a condition that will cause inference to fail with an error that says the method has the same name as a macro. It’s not particularly elegant, but I don’t think it’s a bad thing for the compiler to do.