a collection of lengthy notes on how I am usually coward to do what is wrong and how I am sometimes brave to do what is right. Bits of management and self management, .NET programming, Clojure parenthesis and lots of design mistakes are the things one will find here.

пятница, 9 мая 2014 г.

How to Make an SQL from Clojure or My Macros Kata

Despite the fact that I started learning
Clojure about 2 years ago, until recently I didn’t explore, maybe, the most
fascinating part of the language – macros. I did try to compose simple macros
once or twice, but generally they failed to grasp my attention and I never
understood how they work and, more importantly, when should I use them. To fix
this, last month I decided to implement something interesting with the
help of macros and here I will show what I did and sum up what I
learnt.

The example that I’ve taken might be not the
most suitable for the goal, but it worked out quite well. The idea was to
allow for writing SQL-like queries against the lists of
Clojure dictionaries or records – basically anything that allows key-based
access to elements. The value of such a thing is doubtful, but SQL is a
well-known Domain Specific Language and the fact that macros are considered a great tool for creating new languages in Clojure made me chose it as an exercise. Besides, I remember that there was an exercise like this –
similarly with SQL-like queries – in the Joy of Clojure book, which made
me think that this task suits my learning goals well. However, I intentionally
avoided peeking into the book to not spoil my exercise and, kind of, do
everything myself without borrowing others’ ideas and solutions.

The results at which I arrived are quite
fascinating: I am now able to write almost real SQL against lists of
records. The key difference between my queries and SQL is the use of keywords
as the names of fields to select and filter on. In addition to that, well, my selects come in parenthesis, because I am still writing LISP. Otherwise,
everything looks very similar to the queries one would execute against a
database and that feels pretty awesome. Now, let me share my excitement and
show you the code.

In the first article we will begin with simply
selecting some fields from the records and add where clause for filtering. In the second
one I will show how we can enable joining tables together and
ordering the results. So, what we want to achieve at the first step is to be
able to write a query like this:

This is not a very difficult task, because all
that we have to do is limit the set of key-value pairs in each dictionary in
the resulting list to only the keys, which are specified in the query. On the
other hand, we have this from symbol to treat, but that’s quite easy as well.

It turns out that similarly to the functions I
can’t easily write a macro, which solves precisely one problem, so several
things are going on in select. First, we have to get hold of a set of keys denoting fields to select.
All of them come before the from word and for now we can assume that
everything that goes before it is a field name. This said, we just take
elements of the argument list until we meet from,
push them into a set and bind it to the fields symbol. In
a very similar fashion we extract the table, from which we want to select
records – I expect that it is found right next to from,
so I skip everything until from as well as from
itself and grab the first element of the remaining sequence. Having fields and
source, we can actually define what our macro will return to the caller. We
simply map a function that filters each record according to
the presence of the key in the fields set and composes a hashmap from the
resulting key-value pairs onto the “table”. Although this doesn’t feel like the most efficient
and nice solution, I’m okay with it for now.

Because we
are writing a macro we want to return something that will be evaluated upon
execution of the program – that’s why we syntax-quote our map
call with the `. Syntax-quoting prevents the form from
being evaluated, but unlike simple quote (‘) it allows us to avoid unquoting map,
fn and all the other stuff that is used inside it. One
of the problems with quoting surfaces when we want to use a binding inside the
body of a macro – in our case with fn. The idea
is that binding requires a symbol and we need to introduce it somehow. There
are several alternatives, but the simplest one is to use the gensym thing through suffixing a symbol with # character. When encountered with it inside a syntax-quoted form, Clojure will generate a non-qualified symbol,
which we can then use. Finally, in select we need to unquote fields
and source so that proper stuff will be inserted in the
respective places. If you forget to unquote these symbols, syntax reader will
try to resolve them to fully-qualified ones. This may result either in a
failure in case you don’t have such a symbol outside the macro, or in a subtle
failure – if you have one. The latter situation might lead to great confusion: the code that used to work will at some moment get broken because its
surroundings have changed. Fortunately, in most cases such an error will
surface quite soon if not instantly. Having done all these things right, we can
check whether they work and, fortunately, they do:

Let us try
to understand what did macro allow us to do here that a function wouldn’t
permit. Most notable thing is our use of the from
symbol. Despite the fact that it is not bound to anything we can still utilize
it in the macro – as a separator. A function wouldn’t allow that because it
would try to evaluate all the arguments passed into it and fail with from.
Macros, on the other side, receive their arguments in unevaluated form, which
allows us to play with them in different and strange ways. Note that we could
achieve the same behavior with a function if we used, for example, a keyword :from
instead of the pure symbol, so there is actually little value in using macros
like this. However, knowing that Clojure allows us to write down some utterance
and can live with it is pleasant and inspiring.

Now, when we
can select stuff, let us introduce filtering. This task is less trivial not
only due to more complex constructs needed for conditions, but also because
they will come in infix notation, which is natural for SQL, but hardly a common
thing in LISPy Clojure. Even though this may sound difficult, it boils down mainly to
tossing the list of arguments and just requires covering a couple of cases and careful
use of recursion.

To make
filtering possible we will define a macro that creates a Clojure-evaluatable
condition from an SQL-like one. First, we will try to cover the most simple
case – something like select :a :b from table where :a < :d.
Our new macro – let’s call it condition – will get the :a < :d
part of the above query from the select and
return something like (< (:a table) (:d table)) – the
piece, which Clojure will easily consume. The first version is going to be
ve-e-ery simple:

(defmacro condition [t k1 op k2]
`(~op (~k1 ~t) (~k2 ~t)))

The only
thing that we do here is changing infix notation into prefix one and accessing
the values from a record with the keys specified in the condition. We can use macroexpand to check that condition easily treats simple cases:

The fact
that the macro fits easily into one line makes me ask myself whether there is
any reason to writing it. However, instead of trying to find an answer I will
simply fix the cause and… make it longer. One thing that condition
misses is the ability to consume literals. For example, it is pretty natural to
write a filter like this: where :a > 2 and we definitely
should support this. We will do it the simple way – if we face a keyword in a condition,
we will use it to get the corresponding value from the record and inject it in the
resulting code. Otherwise, we won’t do any transformations to the value and
will include it into the result as is:

Now condition
treats both keys, which denote fields of the records, and mere numbers or
strings equally well allowing us to filter both based on relationships between
fields and on their comparison with some external values. What makes me feel
particularly great about the new version of the macro is that there are no
checks for keywords in the code that it outputs – it determines whether it
looks at a keyword in the let form, which is not a part of the resulting
code. This makes the output of the macro shorter and a bit more efficient. To
see it we can do macroexpand as usual:

Our condition
macro can’t handle neither complex nor long conditions so far. We will make it
almost universal a little bit later, but now let’s take a step back and
assemble condition and select together. When faced with this task
first I did introduce another level of indirection in the form of additional
helper function, which would wrap the condition. However, it turns out that one
can do this with much less clutter. After some experiments, I ended up using a
lambda expression in the body of select:

In the
snippet above we have added another binding to take hold of the conditions
passed into select. We assume that they go after the where
word and thus extract them from the arguments list similarly to the fields keys
and the table. Then, in the body of the macro we introduce filtering, which is performed
with a lambda making use of our condition macro and
the list of conditions in infix notation. The most surprising part is that this
thing actually works:

That’s a
wonderful achievement, we can write SQL queries in Clojure! Well, almost. For
now, we miss a couple things including joins, ordering and more complex
conditions. The latter seem to be the most natural thing to address next, so let us enable
queries like (select :a :b from table where :a <= 2 or :c =
“400”). At the current stage trying to execute something like
this will lead us to a “Wrong number of arguments passed to the condition
macro” error, which is a not so often case when the error
message is actually helpful. We do not in fact allow conditions composed of more
than three elements and that’s what we will fix right now.

The most
trivial way to allow for the longer conditions like :a <= 2 or :c =
“400” is to introduce another version of condition
macro, which would take 8 arguments: [t k1 op1
k2 link k3 op2 k4]. Here k3 and k4
are either field keys or values (:c and “400”
in the above example) similar to k1 and k2,op2 is a comparison operator (=)
like the op1 and link is a Boolean function (or),
which glues two parts together. Now, 8 arguments is quite too much even if we
try to bring SQL into Clojure. If you don’t think so, it is not difficult to
come up with a three-part condition which would require 12 arguments with even
more stupid names. Fortunately, we don’t have to write all these things out
because the similarity between k3 op2 k4 and k1 op1 k2
suggests that we can generalize the implementation to allow arbitrary long
conditions with quite a short and concise piece of code:

First of
all, the version with 4 arguments remains intact meaning that we don’t break the
existing code, which is great. Another good thing to note, is that we reuse it
inside the version with the longer argument list – there is no need to write
the checks for keywords and move arguments once more, given that we have
written this code before. Moreover, this new version of condition
might reuse itself in case the other part of arguments list includes more
than three elements. This basically means that in addition to the original goal
we just allowed writing conditions like :a = 1 or :a = 2 or
:a = 3 or :a = 5 or :a = 6 or :a = 7 at no cost. Recursion is a great thing
by itself, but it is even greater when combined with argument lists of
arbitrary length. Especially so when it works:

This all
looks incredible, but there is another crucial thing that we miss – conditions
can be complex, which means that we might want to include prioritization with
parenthesis into them. This could be a difficult and messy thing to implement
should we write in something that is not LISP. However, with Clojure macros
everything wrapped in parenthesis is a list and can be treated as a list. The
incredible power behind this idea becomes clear once we see how little code we
have to add to the existing condition macro to dramatically increase allowed complexity of
conditions:

As you can
see, the only new thing in the macro is another override accepting two
arguments: the usual record t and a complex
condition. Having taken the complex part it basically unwraps it and calls itself
with the extracted elements. As a positive side effect, this little addition
drops the optional parenthesis in case someone decides to surround the entire
restrictions list with them.

Once again,
recursion and pattern matching do a lot of hard work for us. However, even
though this feels like magic, it isn’t and there is an input that we should
definitely handle, but don’t treat well so far. That’s the first of the above examples but
with reversed order of restrictions: (select :a :b from table where (:d <
3 and :b > 300) or :a = 2) yields an error. The problem is that we
allow parenthesis only in the tail position of the conditions list. On the
other side, should we do TDD, tests would tell us that we have already broken
the select macro – it doesn’t work without filtering anymore, because no matter
if there is the where word and something after it in the
arguments list, we always try to construct a filter. Since condition macro
doesn’t work with a single argument, record, it rightfully fails. Here is the
final version of the macro, which addresses both issues:

To handle queries
without filtering we simply introduce another overload of condition for empty
restriction list that returns quoted truth (this even sounds cool!). When we
invoke it from select, true
ends up in the body of the filter function making it allow every record that it
sees. As for making parenthesis work in any position, to allow this we have to
do a little bit more work. The idea is that when introducing prioritization we made our
older assumptions about what is a comparable value (k1,
k2, etc.), what is a comparison operator (op,
op1 and op2) and what is a Boolean link
between several restrictions fallible. However, to fix this we can just check
whether we look at something enclosed in parenthesis and handle the thing
accordingly if we do. That’s what the new seq? parts
are for. Even though this makes the macro grow in size significantly, I believe
that’s a fair price for almost universally working solution:

Now, that we
achieved an important milestone of selecting stuff from lists of records or
dictionaries and filtering it freely, let me sum up what I learnt from the
exercise:

Macros really make one think in a
different way than when composing mere functions. Even though it is hard to communicate this feeling, writing code that produces
code is whole another sensation than writing code that produces data.

On the other side, Clojure does its best to make code as similar to data as possible. Storing programs in lists is cool, because here and there this saves you from extra difficulties.

Arguments passed to a Clojure function
are evaluated first; those passed to a macro are not. This makes a huge
difference.

You can’t apply a macro, but you can
splice-unquote arguments for it.

In a macro you can have code that it
will return – usually quoted – and code that determines what will be returned
and does not show up in the macro’s result.

Doing katas is worth the effort even if
you don’t feel like it helps you understand anything. If something feels too
much for you just force yourself into it and logic will gradually surface. To some extent learning is like violence.

If you don’t refactor, you might be
missing a chance to understand both your code and your tools.

Explaining what you have done in a
sum-up blog entry might feel stupid, but if you are not doing this you are maybe missing another chance
to understand both your code and your tools.

For now I didn’t
cover a couple of things, which one expects from SQL. Most important are joins
and ordering – these I will implement in the next article. Besides,
there are a lot of minor features like, for example, selecting all the fields
from a table with a star (*). This particular one is extremely easy to introduce,
so I won’t show it here – you can take a look at the code on GitHub if
you want.

Комментариев нет:

Отправить комментарий

About

I work as a team-lead at Acumatica, the Cloud ERP. Besides, I hack code on my own for the sake of learning, fun and whatever else that programming can bring. To reflect on these activities and structure my thoughts and ideas better I also maintain a little blog.