Friday, July 25, 2008

Stemming, Part 11: the cond-ends? Macro

In the last posting, we created the macro if-ends?, which was a combination
of let, if and the function ends?. Today, we’ll look at another similar
macro, one that combines let, cond, and ends?.

The Purpose of cond-ends?

The problem with if-ends? is that sometimes we’ll want to look for twenty or
more different endings on a single stemmer and handle each case differently.
We can do that with if-ends?, but it would be a lot cleaner if we had a
construct like cond.

First, notice that this starts out like if-ends?. It has a variable (st)
that the new stemmer will be assigned to, followed by a stemmer instance to
test.

These are followed by a series of ending/expression pairs. The endings are
tested one by one until a match is found. Once it is, the expression following
that ending is executed with the variable st available to it and assigned to
the new stemmer, with the index changed to reflect the ending. The return
value of that expression is the value of the entire cond-ends? expression.

Finally, if no ending matches, the cond-ends? should return the original
stemmer unchanged. Or, optionally, the last ending can be the keyword :else.
If no ending before that point matched, the expression following :else is
executed and its value returned.

Before we start on the expected output for this macro, let’s look at a short
expression that uses the default value if no ending matches:

Actually, it’s not that bad. Notice that after the first let, it is a series
of (let ... (if ...)) constructions. (I’ve added comments to separate each
of these sections.) Each of those three let/if constructions are almost
identical, until the :else expression. Also, notice that I’ve not only
pre-compiled the vector itself, but also the length of the vector, and I’ve
inserted them into the result example as their literal values. (Pre-computing
the length of the vector was suggested by Holger Durer in the comments to the
last posting.)

Before we start writing the macro, let’s look at the expected output for the
second example above.

One of the things to notice about these output expressions is that a lot of
the state for it can be computed once, and used throughout all the
conditionals in the cond-ends? body. That observation will be useful later.

The Macro

Looking at the output above, we really have three tasks: set up the
environment for the cond-ends? in the outer let; for each test generate a
let/if construction; and finally generate an :else value, maybe using
the default.

One way to tackle this problem would be to separate it into two macros:

One macro sets up the environment and calls another macro to build the
test expressions;

The second macro builds one let/if pair, similar to what if-ends?
did, for one test expression, then it calls itself on the rest of the test
expressions. It does this until it reaches the last test expression and
that test is :else or until it is out of test expressions.

Let’s start with the second macro. That will allow us to see what the first
macro needs to provide to the test macro, and it has to be defined first
anyway, because that’s the way Clojure likes things.

Generating the Tests

The test macro will be overloaded. Either it will take one ending/expression
pair, or it will take many. If there is only one, the ending could be an
:else, or it could be the last ending to test for, and the :else will be a
default. So the first, shorter version of this will return one of these two
constructions (taken from above):

Another point to make about the test expressions is that—as an astute
commenter pointed out (thanks, Holger)—there is a lot of duplication between
the two versions of the macro that builds the expressions. We can capture that
duplication in a function and just call that function with the difference
between the two structures. The function will then create the macro’s output
structure.

All right. Let’s roll up our sleeves. The helper function will be called
make-cond-ends-test. It take the variable symbol, the stemmer variable, an
ending string, a true expression, and a false expression.

Notice that this function requires several parameters beyond the ending
(end) and the result expressions (true-expr and false-expr)? It also has
var, stemmer, and word. But the stemmer and the subword should have
been stored in variables in the first, outer let by this point. That’s
right. In this function, these are the variables that those values were
stored in, represented here as symbols. Throughout this helper function,
~stemmer inserts the variable that the stemmer will have been assigned to in
the outer let.

Now we can define the secondary macro. We’ll call it cond-ends-helper,
because I’m not feeling creative. It’s main purpose is to examine the input
and call make-cond-ends-test with the appropriate false expression. The
first overload for it will handle the last expression by testing for :else
and deciding whether to use the default :else:

What happens here? First, at compile time (before the quasi-quote or call to
make-cond-ends-test), it tests whether the ending is :else. If it is, it
just outputs the shorter version of the ending. If it isn’t :else, it calls
the helper function, with all the same parameters plus the false expression as
a data structure. In this case, the false expression is the stemmer variable.

If your head is starting to hurt, take a moment. Unless you have programmed in
lisp before, this isn’t like any programming you’ve ever done. We’re writing a
program to write programs. Like any meta-activity, it can make your brain
explode. Unfortunately, this is a particularly convoluted macro, because it
has these two levels of macro, plus a function. Don’t worry if you don’t quite
get it at first. There’s nothing wrong with copying and pasting the macro’s
code into your source file and moving on. Start building macros like debug
and if-ends?, and when you’re comfortable there, come back and read this
through again. It will make a lot more sense.

Now, if you’re still with me, here’s the overloaded version of the helper
macro. This takes an ending/expression pair and a list of the rest of the
pairs in the expression. It then calls make-cond-ends-test with a new
version of

In some ways, this part of cond-ends-helper is much simpler. There are just
a few things to comment on.

In the parameter list, & more collects the values of the rest of the
parameters and stores them in a variable I’ve named more. This works
with functions too.

The false expression that this passes to make-cond-ends-test begins with
a quasi-quote (back quote). This just generates an expression to pass as
the false expression. This expression gets inserted into the expression
that make-conds-ends-test creates.

This quasi-quoted expression, when finally evaluated, is a macro call to
the macro that created it. It starts the process all over again, passing
the new macro expansion the values of var, stemmer, and word
unchanged, and splicing all of the rest of the parameters that haven’t
been handled yet into the expression using ~@more.

Creating the Environment

So we know that the cond-ends? sets up the environment by creating an
expression like this:

;; Set up the environment(let [stemmer#(make-stemmer"names"),word#(subwordstemmer#)]; ...)

And it replaces the ellipses by calling the macro cond-ends-helper with the
variable and the variables stemmer# and word#. Put like that, it almost
writes itself.

(defmacro cond-ends?"This is the same as a stacked series of if-ends?. This just sets up the environment for cond-ends-helper."[var stemmer&test-exprs]`(let [stemmer#~stemmer,word#(subwordstemmer#)](cond-ends-helper~varstemmer#word#~@test-exprs)))

Notice that we use & test-exprs to collect all of the test expressions, and
we use ~@test-exprs to splice them into the cond-ends-helper call.

Summary

For posterity’s sake, here is all the code for the cond-ends? macro, in one
place:

4 comments:

First of all, the obligatory, but extremely well-deserved, thanks for this series. It's *really* interesting! I've just discovered it and read all the posts up to here in the last couple of days.

Not being steeped in macros, perhaps I oughtn't comment, but it seems to me that:

1. if-ends? should make use of the ends? function rather than include the logic directly in the macro

2. cond-ends? should use cond (and ends?) rather than tackling the harder task of building nested lets and ifs.

The reason I suggest this is to make it easier for beginners to understand. If there are performance reasons to do as much work as you've done, that's probably not enough justification in a pedagogical example. If there are "correctness" reasons for it, then maybe they should be discussed.

Thanks again. Looking forward to reading the rest, and any other series you feel like producing :)

Glad you're getting something out of it. Unfortunately, it's far from finished at this point, although it does have a lot of useful information. (I also need to update the code to reflect the latest changes in Clojure, but that's another story.)

Now for your questions. (I've answered them together, since they're really the same question.)

I'm not sure that using ends? would make cond-ends? that much simpler, since ends? isn't really a straight predicate function. You'd still have to test at every stage. You might be able to wrap ends? in another function and use "or" to short-cut the tests. That would work, although it might also be a little too clever. You'd also still need to test for the ":else" condition.

You're right, though, the primary reason why I approached things this way was for performance. It does also illustrate one of the advantages of macros, though: they can "insert" expressions into the code at compile time, so that whenever it's executed, it's just a constant. For example, in "cond-ends?", the length of each of the endings is pre-computed ("~(count vend)") and inserted into the resulting expression as a constant integer. Similarly, the ending vector itself is a constant, although I'd have to think about whether that gains anything or not.

Pedagogically, I really should have stuck with a simpler stemming algorithm, as someone suggested (and you commented on) on another posting. That's one of the joys of blogs or any other serial publishing format: by the time realize what a thicket you're in, you may be too far along to turn back.

But you're probably right, using a simpler macro here would have made the explanation much simpler, too.

Thanks for your detailed response. One unfortunate thing I find about Clojure is that it's often hard for me to understand unless I'm writing it (and thereby experimenting at the REPL). Accordingly, I don't fully follow what you've written because it's been some time (a week or so) since I read and grokked the code involved. The solution to this is for me to type in all the code and try to implement if-ends? and cond-ends? differently myself. I'll let you know how I go (won't be very soon though).