Why in the world does a Racket program need to say that it’s a Racket
program? Isn’t that obvious?

We can understand the situation better by looking at another
environment on our desktop, namely the web browser. A web browser
supports different kinds of HTML variants, since HTML is a moving
target, and browsers have come up with crazy rules for figuring out
how to take an arbitrary document and decide what HTML parsing rules
to apply to it.

HTML 5 tries to make this determination
somewhat more straightforward: we can define an HTML 5 document by
putting a DOCTYPE element at the very top of the file which
self-describes the document as being html.

<!DOCTYPE html>

<html lang="en">

<head><title>Hello world</title></head>

<body><p>Hello world!</p></body>

</html>

Going back to the world of Racket, we see by analogy that the #lang
line in a Racket program is a self-description of how to treat the
rest of the program. (Actually, the #lang line is quite bit more
active than this, but we’ll get to this in a moment.)

The racket part in the #lang line isn’t inevitable: the main Racket
distribution, in fact, comes bundled with several languages which can
take the place of the word racket. Many of these languages
(racket/base, typed/racket, lazy) still look like Racket... but some
of them don’t. Here’s one example:

This language does not look like Racket. It looks like line
noise. This is
brainf*ck. Although
this language is not included in the main distribution, because it is
on PLaneT, anyone with Racket
can easily play with it.

Ignoring the question of why?!! someone would do this, let’s ask another:
how do we build this? This tutorial will cover how to build this language
into Racket from scratch.

Let’s get started!

2The view from high orbit

As mentioned earlier, a #lang line is quite active: it tells the Racket runtime how to
convert from the surface syntax to an meaningful program. Programs in Racket get digested
in a few stages; the process looks something like this:

readermacro expansion

surface syntax ---------> AST -----------------> core forms

When Racket sees
#lang planet dyoo/bf, it will look for a particular module that we call a reader;
a reader consumes surface syntax and excretes ASTs, and these ASTs are then
annotated so that Racket knows how to make sense out of them later on.
At this point, the rest of the Racket infrastructure kicks in and macro-expands the ASTs out, ultimately,
to a core language.

So here’s what we’ll do:

Capture the meaning of brainf*ck by writing a semantics module.

Go from the line noise of the surface syntax into a more structured form
by writing a parser module.

Connect the pieces, the semantics and the surface syntax parser,
by making a reader module.

Profit!

3Flight preparations

Since we’re starting from scratch, let’s first make a work directory
where we’ll keep our source code. I’ll call the directory "bf/", but you can use
whatever name you want.

$ mkdir bf

Ultimately, we want to put the fruit of our labor onto PLaneT,
since that’ll make it easier for others to use our work.
Let’s set up a PLaneT development link so the Racket environment knows about our work directory. I already have an account
on PLaneT with my username dyoo. You can
get an account fairly easily.

If we enter the following at the command line,

$ planet link dyoo bf.plt 1 0 bf

we’ll make a development link that will associate any module path of the form (planetdyoo/bf/...)
to our local "bf/" directory. Later on, when we create a package and upload it to PLaneT,
we can drop this development link, and then all the references that use (planetdyoo/bf/...) will
immediately switch over to the one on the PLaneT server.

But does the link actually work? Let’s write a very simple module in our work directory, and
then see that Racket can find it through PLaneT.

$ cd bf

~/bf$ cat >hello.rkt

#lang racket

"hello world"

Ok, let’s see if Racket can find our magnificant "hello.rkt" module if we use the PLaneTized version of the name.

~/bf$ racket

Welcome to Racket v5.1.1.

> (require (planet dyoo/bf/hello))

"hello world"

>

If we get to this point, then we’ve got the PLaneT development link in place.

4The brainf*ck language

When we look at the definition of brainf*ck,
it’s actually not too bad. There’s two bits of state,

a byte array of data, and

a pointer into that data array

and it has only a few operations that affect this state:

Increment the data pointer (>)

Decrement the data pointer (<)

Increment the byte at the data pointer (+)

Decrement the byte at the data pointer (-)

Write a byte to standard output (.)

Read a byte from standard input (,)

Perform a loop until the byte at the data pointer is zero ([, ])

Let’s write a module that lets us play with such a system: let’s call it "semantics.rkt".

Good! Our tests, at the very least, let us know that our definitions are
doing something reasonable, and they should all pass.

However, there are a few things that we may want to fix in
the future, like the lack
of error trapping if the input stream contains eof. And there’s no bounds-checking
on the ptr or on the values in the data. Wow, there are quite a few things that we might want
to fix. But at the very least, we now have a module that captures the semantics of brainf*ck.

5Lisping a language

We might even be cheeky enough to insist that people write brainf*ck programs with s-expressions.
Let’s take that route, and create a module language
that uses our "semantics.rkt". We’ll create such a module language in "language.rkt".

This "language.rkt" presents brainf*ck as a s-expression-based language.
It uses the semantics we’ve coded up, and defines rules for handling
greater-than, less-than, etc... We have a parameter called current-state
that holds the state of the brainf*ck machine that’s used through the language.

There’s one piece of this language that looks particularly mysterious: what’s the #%module-begin form,
and what is it doing? In Racket, every
module has an implicit #%module-begin that wraps around the entirety of the module’s body.
We can see this by asking Racket to show us the results of the expansion process;
here’s a small example to demonstrate.

Ignore, for the moment, the use of syntax->datum or the funky use of '#%kernel.
What we should notice
is that Racket has added in that #%module-begin around the "hello" and "world".
So there’s the implicit wrapping that Racket is doing.

It turns out that #%module-begin can be really useful! In particular,
we want to guarantee that every module written in brainf*ck runs under a fresh state. If
we had two brainf*ck programs running, say like this:

then it would be a shame to have the two programs clash just because they brainf*cked each other’s data!
By defining our own #%module-begin, we can ensure that each brainf*ck module has
its own fresh version of the state, and our definition of my-module-begin
does this for us.

The #lang line here is saying, essentially, that the following program
is written with s-expressions, and should be treated with the module language "language.rkt"
that we just wrote up. And if we run this program, we should see a familiar greeting.
Hurrah!

... But wait! We can’t just declare victory here. We really do want
to allow the throngs of brainfu*ck programmers to write brainf*ck in the surface syntax that
they deserve.
Keep "language.rkt" on hand, though. We will reuse it by having our
parser transform the surface syntax into the forms we defined in "language.rkt".

Let’s get that parser working!

6Parsing the surface syntax

The Racket toolchain includes a professional-strength lexer and parser
in the parser-tools collection.
For the sake of keeping this example terse, we’ll
write a simple recursive-descent parser without using the parser-tools collection. (But if our surface
syntax were any more complicated, we might reconsider this decision.)

The expected output of a successful parse should be some kind of abstract syntax tree. What representation
should we use for the tree? Although we can use s-expressions,
they’re pretty lossy: they don’t record where they came from
in the original source text. For the case of brainf*ck, we might not care,
but if we were to write a parser for a more professional,
sophisticated language (like LOLCODE) we
want source locations so we can give good error messages during parsing or run-time.

As an alternative to plain s-expressions, we’ll use a data structure built into Racket called a
syntax object; syntax objects let
us represent ASTs, just like s-expressions, and they also carry along auxiliary
information, such as source locations. Plus, as we briefly saw in our play with expand, syntax objects are the native data structure that Racket
itself uses during macro expansion, so we might as well use them ourselves.

The first argument that we pass into datum->syntax lets us tell Racket any
lexical-scoping information that we know about the datum, but in this case, we don’t have
any on hand, so we just give it #f. Let’s look at the structure of this syntax object.

So a syntax object is a wrapper around an s-expression, and we can get the underlying datum by using syntax->datum.
Furthermore, this object remembers where it came from, and that it was on line 1, column 20, position 32, and was five characters long:

Now that we have some experience playing with syntax objects, let’s write a parser.
Our parser will consume an input-port,
from which we can read in bytes with read-byte, or find out where we are with port-next-location. We also want to store some record of where our program originated from,
so our parser will also take in a source-name parameter.
We’ll write the following into "parser.rkt".

This parser isn’t anything too tricky, although there’s a little bit of
messiness because it needs to handle brackets recursively. That part
is supposed to be a little messy anyway, since it’s the capstone that builds tree structure out
of a linear character stream. (If we were using a parenthesized language, we
could simply use read-syntax, but the whole point is to deal
with the messiness of the surface syntax!)

And as we can see, we can explode the syntax object and look at its datum. We should note
that the parser is generating syntax objects that use the same names as the defined names we
have in our "language.rkt" module language. Yup, that’s deliberate, and we’ll see why in
the next section.

We mentioned that the parser wasn’t too hard... but then again, we haven’t written good traps
for error conditions. This parser is a baby parser.
If we were more rigorous, we’d probably implement it with the parser-tools collection,
write unit tests for the parser with rackunit, and
make sure to produce good error messages when Bad Things happen
(like having unbalanced brackets or parentheses.

Still, we’ve now got the language and a parser. How do we tie them together?

7Crossing the wires

This part is fairly straightforward. We have two pieces in hand:

A parser in "parser.rkt" for the surface syntax that produces ASTs

A module language in "language.rkt" that provides the meaning for those ASTs.

To combine these two pieces together, we want to define a reader that associates the two.
When Racket encounters a #lang line of the form:

it will look for a reader module in "lang/reader.rkt" and use it to parse the file.

Racket provides a helper module called syntax/module-reader to handle most of the
dirty work; let’s use it. Make a "lang/" subdirectory, and create "reader.rkt"
in that subdirectory, with the following content:

The second line of the file tells syntax/module-reader that any syntax objects that
come out are intended to take on their semantics from our language. syntax/module-reader
is predisposed to assume that programs are read using read and read-syntax, so we
override that default and plug in our parse-expr function into place.

Now that we have all these pieces together, does any of this work? Let’s try it!

$ cat hello2.rkt

#lang planet dyoo/bf

++++++[>++++++++++++<-]>.

>++++++++++[>++++++++++<-]>+.

+++++++..+++.>++++[>+++++++++++<-]>.

<+++[>----<-]>.<<<<<+++[>+++++<-]>.

>>.+++.------.--------.>>+.

$ racket hello2.rkt

Hello, World!

Sweet, sweet words.

8Landing on PLaneT

Finally, we want to get this work onto PLaneT so that other people can share in the joy
of writing brainf*ck in Racket. Let’s do it!

First, let’s go back to the parent of our work directory. Once we’re there, we’ll use the planet create command.

$ planet create bf

planet create bf

MzTarring ./...

MzTarring ./lang...

WARNING:

Package has no info.rkt file. This means it will not have a description or documentation on the PLaneT web site.

$ ls -l bf.plt

-rw-rw-r-- 1 dyoo nogroup 3358 Jun 12 19:39 bf.plt

There are a few warnings, because we haven’t defined an "info.rkt" which provides metadata
about our package. Good, diligent citizens would write an "info.rkt" file, so let’s write one.

Good! This simulates the situation where the package has been installed from PLaneT.

Once we’re finally satisfied with the package’s contents, we can finally upload it onto PLaneT.
If you log onto planet.racket-lang.org,
the user interface will allow
you to upload your "bf.plt" package.

9Acknowledgements

Thanks to Shriram Krishnamurthi for being understanding
when I told him I had coded a brainf*ck compiler. Shoutouts to the PLT group at
Brown University — this one is for you guys. :)