Tuesday, July 22, 2008

Statements as expressions

As most Lua users and every Metalua users know, Lua makes a difference between expressions and statements, and it's sometimes cumbersome to put a statement where the context expects an expression. This can be addressed in Lua by using a function closure, but at a considerable runtime cost.
Metalua introduces a `Stat{ } AST node which lets metaprogrammers introduce a statement in an expression's context, but no concrete syntax was provided, partly to discourage the use of stats-as-exprs (which often is of poor taste), partly because I couldn't come up with a satisfying syntax. In this post I'll propose a syntax that seems better, if trickier to implement, than anything I've come by until now.
`Stat{ block, expr } is an expression node, which evaluates the block, then evaluates the expression in the block's context (therefore having access to its local variables; that's also how "repeat foo until bar" works in Lua, where condition "bar" has access to "foo"'s local variables). The node's expression value is the second child's value. For instance, the following AST, when executed, prints 42:

print( -{ `Stat{ +{block: local x = 21}, +{expr: 2*x} })

I propose to present stat node between keyword "stat" ... "end". In that block, the last statement should be a return, and its value is the returned expression. The example above would be written: print(stat local x=21; return 2*x end).
This is easily done by analyzing the content of the block and extracting the content of the return statement:

Walking a block to extract returns (which are not inside a nested function) has already been addressed multiple times, e.g. here. Now what happens when no return or multiple returns are found? If there's no return, it's easy, just return nil. So above, we simply initialize returned_expr at +{nil}. In case of multiple returns, we could fail at compile time, but we can actually do better, and rewrite this as follows (supposing that goto and label are supported by the syntax):

print(stat
local x=21;
if foo then return foo else return 2*x end
end)

This way, multiple returns are handled gracefully. Moreover, this code has a semantically equivalent (although much less efficient) Lua implementation: "stat block end" can be translated into the closure "((function() block end)())".
The complete implementation looks like this:

The only limitation of the complete version is that multiple returns inside a stat blocks aren't supported. I'd rather keep it that way: the implementation of `Stat{ } is already a hack, since Lua really dislikes statements-as-expressions, at low level as well as at the syntax level. Making it multiple-values compatible would be horribly messy for little added value.
However, it could be handled through rewriting: when at least one return in the block might have multiple values (that means, all returns with a comma, and all returns of a function call), we can:

Change all "return foo" statements into "return {foo}".

Change the default return value from +{nil} to +{{}}

Surround the `Stat{ } node with a call to unpack().

Definitely not worth the kludge, nor the performance hit when treating a statement of the form "return f()", in my opinion. However, just because I can, here is the code:

11 comments:

Anonymous
said...

There must be some obvious reason why this doesn't remove the necessity of writing a LuaJIT 2.0 backend for metalua, but I don't see it.Isn't the AST generated from compiling some metalua code (i.e. metalua itself) just regular old lua AST? So you could get back to 'macro free' source from the AST and then hand that to luaJIT?If that was the case, problem solved? That would be very nice.Perhaps metalua->AST->macro free source->luaJIT->luaJIT bytecode of metalua = super fast awesomeness?

There's no "regular old lua AST", the original compiler goes from source to bytecode in a single pass. Therefore the compatibility between Lua and metalua occurs at the bytecode level, not at AST level.

Moreover, in metalua's AST, there are some nodes (`Goto{ } and `Stat{ }), which cannot be expressed in regular Lua. They might break some invariants LuaJIT depends upon. But there is a large subset of metalua that is LuaJIT-compatible, as long as you refrain from using problematic nodes, or that you offer alternative implementations which don't rely on them.

I've never seriously investigated the issue of LuaJIT compatibility, because in my experience, rewriting a couple of key hot spots in C has always been a very efficient way to make Lua fast enough. But I can help someone who needs it bad enough to do most of the work by himself ;-)

I'm not sure what the current status of LuaJIT is, or how deeply metalua needs hooks into the compilation process in order to work.

LuaJIT 2.0, when released, will evidently use incompatible bytecode. LuaJIT 1.x will mostly run any standard Lua bytecode.

There's also llvm-lua, which does some JIT compilation, though it can't do the same runtime optimizations that LuaJIT can. It compiles from Lua bytecodes into LLVM's intermediate format and works from there.

I've been toying with metalua and finally grasping it well enough to use it in my work. It's a great language.

The JIT options really interest me; Lua is already among the fastest interpreted languages, but it's always nice to go faster. It looks like llvm-lua might fit best with Metalua, as it works with standard bytecode.

@kozmikyak: when LuaJIT's bytecode format will be released, there will be some work to port a compiler for it.

Meanwhile, there are very few instructions (`Goto{ }, `Label{ } and `Stat{ }) which can't be represented as Lua source code; so by avoiding these, or removing them through a bit of graph analysis, we'll be able to compile metalua source --> plain lua source --> luaJIT bytecode.