Wednesday, February 20, 2013

Syntax checking in Scala String Interpolators

In the second post of this
series
I showed how you can write your own string interpolators for Scala
2.10.
The examples were designed to work regardless of the content of the
processed string.
In many other cases, we care what the string looks like and its form
affects what we want to do.
Often we want to complain if the content of the string is not
acceptable.
Options are a run-time error or a compile-time one.
The latter case requires us to extend the compiler, which I will do
using 2.10’s experimental macro features.

Octal number literals

Suppose that we want to use the notation o"177" to stand for an
integer literal whose value is specified in octal (base 8).
Our interpolator will have to perform a numeric conversion or complain
if the string literal is not a legal octal number.
(For simplicity, we ignore interpolated expressions in this example.)

We access the string literal using the parts method of the
StringContext which returns a list of all of the context string
parts from the literal.
In this case there will only be one since we are not supporting
interpolated expressions.

Once we have the string, a regular expression pattern checks whether
the literal is in the correct form or not.
If the form is ok, we convert the string and return its integer value.
Otherwise, we throw an error to complain about the format violation.

Compile-time checking

Run-time errors are fine in some situations, but many of us would like
to get stronger guarantees about our code.
In particular, we’d like the compiler to complain if we try to write
an octal number literal but get the format wrong.

One way to achieve this kind of checking is to extend the compiler
with a macro.
Macros are a new experimental feature in Scala 2.10 and are planned to
be fully-supported in 2.11.
In a nutshell, a macro is given access to the compiler’s abstract
syntax tree (AST) for a method call.
It can return a new AST that the compiler will use instead of the
original call.
Thus, the macro can replace the call that a user writes with any
legal expression.

(Strictly speaking, the macros we use here are def macros because
they implement the bodies of def constructs.
Other macro styles in development will be able to replace types and
other Scala constructs.)

Another alternative to get this kind of checking is a full-blown
compiler plugin.
A plugin can be more powerful than a macro, because it has full access
to the whole compilation unit.
However, writing a plugin is harder since much of the plumbing and
book-keeping must be implemented.
In contrast, the macro system takes care of many of the details of
hooking into the compilation process, accessing the abstract syntax
tree, and so on.
We can focus on the actual replacement that we want to construct.

In our experience Scala 2.10 macros are pretty reliable, but you
should be aware that they are experimental so it might be dangerous to
rely on them.
It’s also easy to get yourself or the compiler in a tangle when
writing a macro since you are essentially working with the compiler’s
internal representations.

An octal number literal macro

Our plan is to replace the octal number interpolator we wrote above
with one that is implemented by a macro.
The advantage is that the macro will execute at compile time so it
will be able to issue compile-time errors if we get the string literal
wrong.
In the non-error case the macro will be able to perform the number
conversion, thereby saving the program from having to perform it at
run-time.

First, we write the o method, but instead of giving the full
implementation, we use the macro keyword to indicate that this
method is a macro that is implemented by the OctalImpl.oImpl method.
This simple notation suffices to get the compiler to call the macro at
compile-time and use its result.

An import of scala.language.experimental.macros or the corresponding
command-line option will be necessary to enable the macros feature.
It is also necessary to ensure that the macro and its uses are not in
the same compilation unit, since the compiler needs to have access to
the compiled macro implementation when it compiles the uses.

The macro signature

The signature of a macro is closely related to that of the method that
it implements.
The signature of the oImpl method that implements o is as follows.

def oImpl (c : Context) () : c.Expr[Int]

The first parameter list contains a Context argument that can be
used by the macro to find out about the context in which it has been
called.
The second parameter list contains one argument for each argument of
the method.
The arguments passed here are the Scala AST representations of the
argument expressions used in the original method call.
In our case, the o method has no parameters so the second parameter
list of the macro is empty.
Finally, the return value of the macro is a Scala AST that represents
the expression which we want to use to replace the macro call.
The return type is path-dependent on the context and says that the
value must be an expression whose type is the type of the original
method.
Thus, we have a return type of c.Expr[Int] here because o returns
an Int.

The context and the trees

The context gives us access to many wonderful things.
For example, the context provides position information for the macro
call and a way to issue error messages.

The context’s universe gives us the Scala AST node definitions which
we will need to query and construct trees.
To save space in the code, we introduce a local alias u for the
universe.

importc.{universe => u}

and import everything

importu._

Importantly for the octal number literal macro, the context gives us
access to the AST on which the method call in question has been made.
This tree is returned by c.prefix.tree.
The call o"177" desugars to

OctalContext (StringContext ("177")).o ()

after the implicit conversion has been applied.
The prefix tree represents the bit without the method call.

OctalContext (StringContext ("177"))

A common development approach is to print out the trees to see what
the compiler is giving you.
The u.show and u.showRaw methods are particularly useful for this
purpose.
For example, u.show (c.prefix.tree) returns the following in the
macro we are defining for the call o"177" (modulo some formatting).
This tree has fully-qualified names and explicitly calls the apply
method to construct the string context, but otherwise is the same as
above.

OctalMacros.OctalContext (scala.StringContext.apply ("177"))

We can see the actual tree nodes used in the representation of this
expression in the compiler by printing u.showRaw (c.prefix.tree)
(reformatted to make the structure clear).

The differences are in what is returned.
If the format is ok, we perform the conversion to get the integer
value.
However, instead of returning that number, we return an AST that
represents an expression that evaluates to that number.
That expression is

Literal (Constant (Integer.parseInt (orig, 8))))

In other words, a literal integer constant containing the value we
want.
The c.Expr[Int] constructor combines the expression tree with its
type representation.

In the error case, we call the c.error method to report a compile-time
error.
c.enclosingPosition gives us the source code position of the
macro call.
c.error is a Unit method, so we need a dummy return value to
satisfy the return type.

Our macro implementation is complete.
In summary, the macro is given the AST that represents the call.
We delve into the AST to access the string literal from the call.
We process that literal to check its format.
If the format is ok, we convert it to a decimal value and return an
AST that represents that value.
If the format is not ok, we trigger a compile-time error that says so.

Using the macro-based interpolator

The macro can be used (in a different compilation unit) in the same
way as our previous non-macro implementation.
This abstraction is nice since you can switch the implementation to a
macro without requiring users to rewrite their code.
Of course, they must recompile it.

def main (args : Array[String]) {
println (o"177")
}

127

If we try a literal that is not a valid octal number, we get the
expected compile-time error.

Why use a string interpolator?

Some readers may be wondering why we bother with a string
interpolation for this example.
All of the work is being done by the macro and the interpolation is
not really contributing very much.
In this case that is true, although I quite like the o"177" syntax,
compared to something like o ("177") which we would have to use with
a pure macro implementation.

String interpolations come much more into their own as an interface to
macros when the string format is more complex and the processed
strings can contain interpolated expressions.
Having a standard and concise syntax to indicate where the expressions
are to be placed is a big advantage.
Otherwise, every macro writer needs to invent their own convention
to pass the pieces to the macro.

What’s next?

The octal number example is a simple case of a very general problem.
The format of a processed string can be quite tightly defined, perhaps
by a formal syntax.
We would like to be informed by the compiler if we err in the syntax
of a string.
Later posts will revisit this issue.

First though, in the next post we flip the problem around.
Instead of using processed strings to construct values, we will use them
to deconstruct values via pattern matching.
We will see that the processed string syntax leads to quite concise
pattern matching of complex structures.