Hash Table Syntax in Common Lisp

I sometimes get caught up in thinking about what a language should do for me that I forget what a language allows me to do. Take, for instance, the hash-table in common lisp. On first glance, it seems a bit unwieldy, because building it up is somewhat annoying for small cases. Of course we can use associative arrays, but that comes with its own annoyance (such as having to specify to use #'equal as the test when looking it up if you key on strings). What I'd really like to have is a new kind of syntax:

{:key1 => "value1", :key2 => "value2"}

It is handy to have this succinct declaration at our disposal, and it's a bit annoying that CL doesn't have it off the bat. And getting caught up in that annoyance I forget that there is little standing in the way of creating it. But first let's start with something simpler:

{:key1 "value1" :key2 "value2"}

Setting up the basic reader macro

Let's turn this into a hash-table. To do so takes two steps: define a reader macro for the { character that splits the list into pairs, then generate code for the hash table after. First, the pair generation.

There's a few things going on here. The first is simply defining the reader for {, which gets us started. Then the body of the loop reads a key and a value and keeps on going until we hit the end which is denoted by }. We set that second macro-character on a readtable that is a shadowed copy of the main readtable so as to not mess around with people trying to us } as an every day character. The nice part about this is that the reader will do the right thing and stop reading the previous atom just before the } character comes up. And this will generate pairs:

So now we'd actually like a hash map to be born out of this, which means taking those pairs and doing something with them. We'll modify the above to save them in a variable which we'll use in a quasiquoted return value:

Adding some syntactic sugar

However, we'd like it to be a bit prettier, so we'll add in some commas and arrows. What we'd really like is what I described above:

{:key1 => "value1", :key2 => "value2"}

All of this is really just syntactic sugar. Because each key and value are atoms in and of themselves, we can stop right here and create basically a new kind of list that should always have an event number of atoms and is surrounded with '{}'. But because we have fallen in love with Ruby, we should go all the way, and read in the arrows and commas, check to make sure they are arrows and commas, and discard them anyway. Unfortunately, commas already exist in the read table, and they'll throw a nasty error if you read them. However, we can't just blow away the #\, character in the readtable like we did with the #\} character because it's often used (we'll just ignore legitimate uses of the #\} character in functions like 'weird-func-name-}', as I've not yet seen a function named that). What we'll need to do is enhance that loop to include two new reads:

We now have a new syntax. It should even work with weird quasiquote macros like this:

(defmacro hash-for-val (val)
`{:key1 => ,val})

I don't know if you'd ever need a macro like that, but the comma works just like you'd expect, and indeed if you do run that, the following will work:

(gethash :key1 (hash-for-val "g"))

Adding Pretty Printing

The last part of what we'd like to do here is make it so that when lisp prints out a hash-table, it prints it in the form that we could read back in by pasting. To do this, we need to ensure the pretty printing is on, and set the pretty print dispatch for the hash-table type. This is fairly easy to do, but it's worth noting here that maphash does indeed not work like mapcar (it returns nil), so we'll have to use loop to convert it to a list of pairs so we can print it with format:

(set-pprint-dispatch'hash-table(lambda(str ht)(format str "{~{~{~S => ~S~}~^, ~}}"(loop for key being the hash-keys of ht
for value being the hash-values of ht
collect (list key value)))))

After this, this is what you should be able to do on the REPL:

CL-USER> {:test => 9, "foo" => 3}
{:TEST => 9, "foo" => 3}

Of course, this abstraction is a little bit leaky with regard to normal read macros. When you actually do the following:

(read-from-string"{:test => 9, :foo => 3}")

You'll get a large s-expression that would build up the hashmap for you. However, this is not how other reads work. For instance, using the #S reader macro, you'll get actually get out what you put in, as follows:

Whereas if you actually type #(FOO :A N :B M) into the REPL, it will attempt to evaluate N and M. When creating my own reader macro, I could either get it to act like #S does from the REPL (by way of creating the code that creates the hashmap), or as it does from the reader (by actually creating the hashmap at read time, and not evaluating any of the symbols). Perhaps there is a way to make it do both, but I could not find it trivially.

In conclusion, I usually forget, when I'm bemoaning the lack of some nice syntax that some other language has, that Common Lisp can become that for me. But there is a danger here. The language I'm now using isn't really Common Lisp anymore, but some dialect that only I know. If I started adding some more extentions like [] for arrays, then suddenly no one will know how to read the code but me. Because I use CL alone, in a vacuum, this actually works out pretty well. However, if you're going to be creating a library for others to consume or read, then maybe you should abstain from too much weirdness. And you should definitely not ever modify the end programmer's reader without their permission, as that will just cause all kinds of problems.

And so we come to the sliding scale: Should we have complete flexibility, or complete specification, or somewhere in between. Personally? I like the flexibility, and that's why I find my home in Common Lisp.