clojure

I didn’t intend to wait a month between installments, but here we are. When we left off, we were discussing a couple of implementation os exponentiation, and how to print things out. We made it up to page 35.

Next, the book discusses the main Lisp data structure: the list. It discusses how Lisp assumes that in any list, the first item in that list is a command of some sort, and to change that behavior, you place a single quote, ', before the list. Clojure has the same expectation, so that if you enter this at a REPL:

(foo bar baz)

you will get the following error,

CompilerException java.lang.RuntimeException: Unable to resolve symbol: foo in this context...

indicating that it tried to execute that list as a function call, but failed to find a function called “foo”. Just as in Lisp, you can quote the list by prefixing it with a single quote. Thus, this

'(foo bar baz)

is now treated as data, instead of a function call. Similarly, this is a function call

(expt 2 3)

is a call to the expt function, while this

'(expt 2 3)

is a list with three elements.

Barski then goes into a length discussion about the list and its structure, specifically the cons cell. In Lisp, a cons cell is a pair of items, and its list is implemented as a linked list, with the second part of each cons cell linking to the next cons cell in the list. Clojure does not have a cons cell, and so does not implement its lists in terms of them. Lisp uses cons cells individually as simple pairs, too, and you can simulate this use by using a 2-element vector, like so

(def pair [:a :b])

Beginning on page 38, Barski starts discussing the various functions that operate on lists. I will provide the Clojure equivalent functions, where possible.

The cons Function

In Lisp, you create cons cells using the cons function. The result is a cons cell with the two arguments set as the left and right portions of the cell, or just a left-side, if the right is nil. You can simulate this, using the vector function

(vector 'chicken 'cat)

or as a literal vector

['chicken 'cat]

Clojure does have a cons function, but it functions somewhat differently than its Lisp counterpart. It prepends its first argument onto its second argument, which must be a list-like thing, returning the result as a list. So this in Lisp

(cons 'chicken 'cat)

can’t be done using Clojure’s cons function, since the second argument is not a list-like thing. If you try, you will get this error

If you need to create a sequence of a chicken and a cat, use one of these:

(vector 'chicken 'cat)
['chicken 'cat]
'(chicken cat)

Obviously, the first two create vectors, while the last one create a list.

In Lisp, the empty list and nil are equivalent, but not so in Clojure. However, these two examples of consing a value and and empty list, and a value and nil, behave essentially the same, returning a list of just the first element.

(cons 'chicken 'nil) ; (chicken)
(cons 'chicken ()) ; (chicken)

The next three examples function effectively the same, though remember that what you are getting back is not a linked list of cons cells, as you would in Lisp.

The car and cdr Functions

Lisp has the basic functions car and cdr for accessing the first element of a lisp, and the remaining elements of a list, respectively. These names are directly related to the original hardware on which Lisp ran. Clojure does not have functions with these names, but it does provide its own functions that give the same results.

Instead of car, Clojure gives us first, which actually is a better name for what it does. It works like this

(first '(pork beef chicken)) ; pork

Instead of cdr, Clojure actually has two functions, next and rest. With a listy thing of 2+ elements, both these function behave the same, returning everything but the first element

They differ, however, on what happens if there’s nothing after the first element, or the entire list is the empty list. next will return nil in both these cases, while rest will return an empty list. As I said, in Lisp, these are the same thing, but in Clojure, they are very different. The empty list is truthy, while nil is not

(when nil :truthy) ; nil
(when '() :truthy) ; :truthy

The c*r Functions

Lisp defines combinations of car and cdr to allow you to get the second item, the third, the second item from the remainder of the remainder of a list, etc. Most CL implementations provide these functions up to four levels deep. Here’s a partial list to illustrate

And let’s be honest, keeping all those ‘a’s and ‘d’s straight can be pretty confusing (at least, it is to me).

Since Clojure doesn’t have car and cdr, it also lacks these functions. It does have one analog function, and that is for the CL function cadr, which provides the car of the cdr of a list, also known as the second item. Thus, Clojure has second that does the same thing,

(second '(pork beef chicken)) ; beef

but that’s it for built-ins. You can roll your own versions of these CL functions, but from what I’ve read, this is considered a code smell. Even though I don’t think you should try to implement these functions, let me show you how you might do it, if you wanted to.

Instead of explaining each one, and its possible implementation in Clojure, I’m just going to include the code for all of them in one, big blob.

So, if you shouldn’t create analogous functions, how do you get at the elements you need? One way would be to use Clojure’s destructuring. I’m not going to go into a full explanation of how destructuring works, but suffice it to say that it’s a way to assign elements of a listy thing to individual locals.

Chapter three of Land of Lisp is all about Lisp syntax. This post will be sort of scattered as far as content goes, since the chapter covered a lot. Many things are the same in Clojure, but there are some serious differences. The first is how to define a function.

Defining Functions

In Lisp, you use defun, but in Clojure, it’s defn. Here’s a square function in Lisp.

(defun square (n)
(* n n))

And here’s the same function in Clojure. Notice that the function arguments are enclosed in square brackets (it’s a vector), instead of parens.

That string is known as the docstring. It stays with the function, and is available in the REPL by running the doc function, like this (doc square). Lisp also supports docstrings in functions, but it comes after the argument list, instead of before. While docstrings are optional, I highly encourage you to include them. They can span multiple lines, and since they stay with your function, they are useful from the REPL.

Equality

In Lisp, there are may functions for determining equality, and you have to choose the right one for any given circumstance. Among these are eq, equal, equalp, and a few others. In Clojure, there’s just =. If you’re coming from Java, you know = by itself is assignment, not an equality check. For that, you have to use ==, but even that only computes reference equality, and is not always what you want. In Clojure, = does everything you want, in every circumstance. It is your friend.

Exponentiation

Starting on page 34, there are a few examples using the expt function, which raises its first argument to the power of its second. This is a built-in function in CL, but Clojure doesn’t have one. You could use Math.pow from Java, but this only works with doubles, and once the numbers get really large, it switches to scientific notation.

(In case you haven’t seen it, appending an N to a number literal causes the number to be of type clojure.lang.BigInt. Appending an M makes it a java.math.BigDecimal.)

You can write your own exponentiation function that gives better results than using the one from Java. Here are two different ways to write it. Both versions are tail-recursive, which means they won’t exhaust the stack, but the first uses a nested function, while the second is recursive on a loop. Here’s the nested function version

Notice the letfn that contains a local function called rexpt. This function does all the work, and is called as the last line of the main function. It takes a parameter to be used as an accumulator, and this is returned once the exponent is decremented to zero. This nested function is also a closure, because the value of x is referenced directly. We don’t need to change it like we do n, so we just use its name.

Now, here is the version that uses loop. While CL has a loop macro, Clojure’s loop is completely different. All it does is provide a recursion point. This means that when you use the recur function later, execution will jump back to where the loop call is, instead of back to the beginning of the function. The locals declared in the loop’s vector are rebound with the values specified by the recur call. I think this version is easier to understand than the first one.

Notice that the code inside the loop is identical to that in the rexpt local function from the previous example. It’s just not wrapped inside another function. Also of note is in the let we assign n to n. This is a common technique, and will result in a local called n being assigned the value of the passed-in n. The local n can then be decremented with each recursion, without affecting the outer n.

Notice that passing in an integer results in an integer. Passing in a double results in a double. And passing in a BigInt results in a very large number (Hint: scroll horizontally… it goes on for a while).

Printing Things

CL uses (princ), (prin1), (print), etc., to output things to the console. In Clojure, you use (print) and (println).

To Be Continued…

Just like in Lisp, Clojure uses let to define locals. The only real difference is that Clojure uses a vector of names and their bindings, whereas Lisp uses a nested list.

This Lisp code

(let ((a 5)
(b 6))
(+ a b))

looks like this in Clojure

(let [a 5
b 6]
(+ a b))

I think the Clojure way is a little easier to read.

The biggest difference between the two is when it comes to local functions. CL has flet for defining local functions, and labels for defining local functions that need to be able to call each other (or call themselves recursively). Here’s an example of each

If the functions need to reference each other, in CL you have to use labels, instead of flet. Here’s how that looks (the only difference is the form used; the arguments remain the same)

(labels ((a (n)
(+ n 5))
(b (n)
(+ (a n) 6)))
(b 10))

In Clojure, you don’t need to use anything other than letfn, because it already supports the recursive nature that labels provides

(letfn [(a [n]
(+ n 5))
(b [n]
(+ (a n) 6))]
(b 10))

Finally, if you have local functions and other local bindings you need to establish, you can use a let, but no recursion is supported. This is sort of like CL’s flet but you can also use it for binding locals that are not functions

(let [a (fn [n]
(+ n 5))
b (fn [n]
(+ (a n) 6))
c 10]
(b c))

I think the way Clojure uses square brackets in certain places that CL uses parentheses makes the code easier to read, overall.

I read Conrad Barski’s excellent book Land of Lisp a couple of years ago, and worked through all the examples using CLisp, but I thought it might be fun to go through it again, but use Clojure instead. Other people have done it already, but what’s one more, eh?

So, the first example is for a program to guess a number you are thinking of. In Lisp, defparameter allows you to rebind values, but Clojure’s def is immutable. Using a ref gets around this, though it is a bit clunky (since refs are intended for multi-threaded access.) The code is not great, and you wouldn’t write a Clojure program like this (or a Lisp program, really); it’s just to get the discussion moving. Better code is coming.

Anyway, here’s the number-guessing program in non-idiomatic Clojure. To run it, load it into a REPL, then execute (guess-my-number). If you are so enraptured with the game that you want to play it again, execute (start-over) and then (guess-my-number).

I like to test things out interactively, so I love working with languages that provide a REPL. I’m currently working on a Java project, but Java doesn’t have a REPL. Several languages built on top of the JVM do have them, and these langauges can access the Java classes on their classpaths. Groovy, Scala and Clojure are just three such examples, that I happen to work with.

I got this tip from this response on a Stackoverflow.com post. His tip was for Scala, which looks like this:

The bit between the backticks runs a Maven goal that outputs the jars that your project depends on, and then extracts just the list of fully-qualified jar files to append to the Scala classpath. If you want to use Groovy for your REPL, it would look like this:

If you happened to read my post from the other day entitled My New “Top Artists Last 7 Days” Widget, you know that I went through three iterations of getting it going. The final solution, written in Ruby worked well. Until bands like Motörhead, Mötley Crüe and Einstürzende Neubauten showed up in the list. At that point, the HTML parsing library I was using would barf, and processing would stop, leaving the list showing on the blog in an incomplete state. It wasn’t the library’s fault; apparently Ruby still has problems dealing with non-ASCII characters. I did everything I thought I needed to do to tell Ruby that it would be dealing with UTF-8 encoding, but it just kept right on barfing.

I was left with only two choices: stop listening to any band with an umlaut in the name (and God help me if any of my Scandinavian bands popped up, with the Ø or å characters), or rewrite the stupid program, again, in a language that I knew could easily deal with UTF-8.

Since I’ve been working in Clojure a lot lately, it seemed lika the logical choice. I spent about an hour working on it last night, and I ended up with a working program and a bit more Clojure experience. Here’s the program for your edification, with a description to follow:

I ended up using a library called clj-http to handle the fetching of the URL. It’s a Clojure wrapper for the Apache HTTP Commons library, and was really easy to use. I’m using Leningen, by the way, so including clj-http was just a matter of including a line in the project.clj file. I also used a Java library called HTMLCleaner, that fixes broken HTML and makes it available as a DOM. Since it is also in Maven Central, it was easy to include by adding another line to the project file.

The -main function begins on line 38, but all it really does is check that there is a single command-line argument, and exits with a usage message if there is not. It then calls the fetch-data function, which begins on line 20.

On line 21, we declare two locals; one that will contain the results of fetching the web page, and one that is the HTML cleaner. If the fetch of the URL was successful, the status code will be the standard HTTP 200. If we got that, we then open a PrintStream on the filename given, specifying that it should be encoded with UTF-8. (I’ve been working with Java for a very long time, and I always assumed that since Java strings are Unicode, files created with Java would default to UTF-8. That is not the case. That’s why there’s a second argument when creating the PrintStream, and why I’m not using a PrintWriter.) We then print the first part of the output HTML file, set a couple of options to HTML Cleaner that cause it to strip comments, style and script sections from the HTML, and then start doing the real work.

On line 29, we declare a local called node that will contain the output of HTML Cleaner if it successfully parsed and cleaned the HTML. That’s what when-let does; it assigns the local as long as the function returns something truthy and then executes its body. If that function doesn’t return something truthy, the rest of the code is skipped. We then take the first five elements from the HTML that have an attribute called “class” with a value of “subjectCell”. These are table cells. We then loop over them, extracting the artist and playcount value, and the URL. We do these things in two separate functions.

The function called get-artist-and-playcount, starting on line 8, takes the table cell as input. It then gets the attribute called “title” and uses a regular expression to pull out the artist and playcount values. If the playcount is the word “once,” it converts it to a 1, so all the values are numeric. It then returns the two values as a vector.

The function called get-url, starting on line 14, also takes the table cell as input. It then gets all the “a” elements from the cell (there’s only one), and then gets the “href” attribute’s value, which is the URL.

Back at line 34, we take the three values we extracted with the two support functions and concatenates them together into HTML that will be a single line in an ordered list. We then output all the necessary closing tags to make the HTML well-formed, and we’re done.

While the Clojure code is a bit more dense than the Ruby code, it’s actually four lines shorter. And it handles Unicode characters, which makes me happy.

I’ve known about the FizzBuzz problem for a few years. I’ve written solutions for it in a few languages, but never posted them. I’ve been working with Clojure lately, and after reading articles about how many job applicants can’t solve a simple problem like this here, here and here, I decided to do a Clojure version. (It baffles me that someone who claims to be a developer can’t come up with a solution for this, no matter how good or bad it might be.)

I ended up doing it three different ways. The first is a simple first-cut solution. The second is somewhat better, I think, and the third is a refinement of the second. In all three cases, they use a nested function to do the evaluation, and return a lazy, infinite sequence. Here’s the first

This solution does work, but I have a problem with the fact that the division tests are done twice. I think doing those tests twice increases the chances of making a mistake. The second version does the tests one time, assigning the results to locals. It then checks them for nil, and concatenates them together, relying on the fact that a nil will not print.

In this version, instead of passing an anonymous function to map, I assigned it to a local in a let expression. You can see that I only do the math once, assigning locals with either the appropriate word, or nil. I then check that one or the other of the locals are non-nil, cat them together and return it. If both are nil, the number itself is returned.

The third version is almost identical to the second. The only difference is that the second one used a let expression, and the third one uses a letfn expression. It’s effectively the same thing, but the third one is ever-so-slightly shorter, and I think every-so-slightly easier to read.