Wednesday, March 24, 2010

I had to struggle a bit to come up with a satisfying Clojure environment on my Windows machine, so I thought it would helpful to try summarizing how I did it, while it's still fresh in my memory. I'll be covering two independent but related tasks: setting it up for Cygwin, then for Emacs (using SLIME).

Update:A reader suggested using freshly built jars from build.clojure.org, a very good idea, also meaning that more or less half of this post is not so useful anymore..

Update 2:Although I first describe setting up Clojure for Cygwin, the Emacs version I am referring to in the second part is actually the MinGW-compiled one, not the Cygwin one (see comments for more about this).

Cygwin Setup

My first experiment was with the precompiled jar files available on the Clojure Google Code page. It worked very well, but then I needed some additional string functions that could only be found in the 1.2 master branch of clojure-contrib at the time this was written, so I had to compile from source. For this you need a couple of things: first the JDK (I'm pretty sure the JRE alone wouldn't be enough), git (that you can install through the Cygwin Package Manager), Apache Ant (I used version 1.8.0) and Apache Maven (I used version 2.2.1). Once all that is installed and properly configured (it helps to have them all available on your PATH as well), you can download a copy of the Clojure source:

$ git clone git://github.com/richhickey/clojure.git

this command will actually create a "clojure" source directory right where you execute it, so you can "cd" to it, and then simply do:

$ ant

to compile. If everything goes all right, you should see BUILD SUCCESSFUL near the end, and then the Clojure jar file is ready to use. Next we build the clojure-contrib library, by first downloading it:

$ git clone git://github.com/richhickey/clojure-contrib.git

and then build it, with Maven this time, after having "cd"ed in the "clojure-contrib" sub-directory just created:

$ mvn package

..although for me, at the time of writing, this fails, with some testing error! If this is the case for you too, the trick is to skip the testing phase, by using instead:

$ mvn -Dmaven.test.skip=true install

You can now collect the two newly created jar files (clojure-1.X___.jar and clojure-contrib.1.X___.jar) and copy them in one unique folder, which will be handier for the remaining steps: we will refer to this location as <clojure_path> from now on. For convenience, I added to this lot the JLine jar file, that adds useful features to the Clojure REPL, like history navigation (using up/down arrows). The next step is to create a Bash file to startup your Clojure programs. Following the convention, create a file named clj, place it in the <clojure_path>, and edit it so that it contains, for instance:

Please note that in this hybrid setup (Cygwin Bash, using Windows Java) the double quote and semicolon syntax must be carefully respected. Once this file is available on your PATH, you should be able to start a Clojure REPL by simply calling it:

$ clj

or execute a Clojure script:

$ clj your_script.clj [arg1 arg2..]

Emacs Setup

With a little more fiddling around, I was able to set up a Clojure environment for Emacs 23 on Windows as well (the MinGW-compiled version of Emacs, not the Cygwin-based one). Since the SLIME environment can download and install a Clojure environment all by itself, not much should remain to be said. However, I wanted it to use the latest binaries I had just compiled (not the 1.1 ones that were the default at the time I wrote this) and that is what I am going to describe here. It is very easy!

The first step is to install the very fine ELPA package manager for Emacs. Then invoke it using:

M-x package-list-packages

and put an "i" next to those four packages: clojure-mode, slime, slime-repl and swank-clojure. Then press "x" to download and install them all. Once done, you can summon SLIME:

M-x slime

and answer "y" when it asks you about installing a missing Clojure environment (even if you did build and install a fresh one already, like me, previously). Once it's installed, if you invoke SLIME again, it should launch a default 1.1 REPL. But if, like me, you are interested in running it with your own compiled jar files, you must do two things. The first is to locate the swank-clojure-1.X.X.jar file that the SLIME/Swank-Clojure setup has produced (for me it was located in c:\.swank-clojure), and copy it to your <clojure_path> (that is, the unique location where all the Clojure jar files you want to use should be found). Once this is done, the only thing that remains is to instruct SLIME and Swank-Clojure about this change, by setting a single variable in your .emacs file, like this:

Monday, March 22, 2010

I wanted to learn a bit of Clojure, but I only had a couple of Emacs Lisp notions.. So I thought it would be fun to try writing a mini Lisp interpreter, in Clojure (using Emacs, to add a level of self-reference!) and document the process. But first please consider (1) that I tried to not consult any book or website about parsing or interpreting Lisp for this exercise (other than my memories of some vague CS notions), and (2) that I was not, and have not become a Lisp expert in any way, so it is pretty obvious that this will contain blatant errors, lack of stylistic taste, grossly non-optimized algorithms and wrong (or abuse of) Lisp terminology.. Please read it only for the sake of the learning process of someone new to Lisp/CLojure, but very enthusiastic about it.
I'll be using clojure-contrib version 1.2 (from the current master branch), since its "string" API provides some very useful functions (that seem to not be available in the 1.1 version):

(require '[clojure.contrib.string :as str])

If we assume that we will be processing a mini Lisp source file as one big string, one first thing we can do is stripping the comments from it, using a simple regular expression:

(defnstrip-comments [s]
(str/join "" (str/split #";.*\n?" s)))

We can then tokenize the input string, by giving the partition function (from the "string" API) a regular expression that chunks the input file content using parentheses and white space separators (while not breaking string literals, which may contain such characters). Note that one important feature of the partition function (that we will make use of) is that it retains the separators in the result list, which is why we need to trim and filter it, in order to remove the unwanted empty strings:

Being done with the preprocessing, we are now ready to parse our list of tokens, that is, build a syntax tree representation of the input program, in which every atom is a leaf, and every list is a subtree. Thanks to the particular Lisp syntax, this is relatively easy.. although I must admit that I didn't find the solution as easily as this sounds:

We now need to extract symbols from this syntax tree: functions, defined with the defun special form (remember this is a Lisp interpreter, not a Clojure one), and variables, defined with setq (setf would be much harder to implement I think, and is thus outside the scope of this simple interpreter). Each symbol will be stored in a Clojure map, and will point to a Clojure vector of two elements in the case of variables (a "var" identifier, and the variable node, as found in the syntax tree) and three elements in the case of functions (a "function" identifier, the list of function parameters, and a list of the function nodes, as found in the syntax tree). All this is accomplished with this function:

When it reaches a function node, that is, a node that should do something, this is where we inject semantics (admittedly, quite a simple one) into our interpreter, that was concerned until now only with the "what" to compute, and not with the "how" to do it. On a side note, I would say that this is the place where I reached the "relevance limit" of this learning project: in particular, lists are being implemented.. with lists! I guess that if I would have gone the hardcore way, I would have implemented my own lists from lower-level components, but to keep things simple, I figured this would make an acceptable place to stop (obviously a much sillier place to stop would have been to simply "eval" the whole thing in the first place!).. Anyway, here is the deeper function:

For user-defined functions, i.e. symbols that can be found in our previously extracted symbol table, we require an extra step. We need to "instantiate" the function, that is, replace all the formal parameter symbols in its definition by their actual value when the function is called. This is first accomplished by creating a map that associates every symbol to their value (using the "zipmap" function), and then pass it to this function, that will do the rest of the job:

One thing I really wanted to be able to do, and was a bit anxious about because I didn't dare to "execute it in my head" before trying it.. was recursion, which I was happily surprised to see handled gracefully: