The half-compiler programmer

If you had taken a course in compilers in your college, you would have learnt about syntax trees. The primary goal of a compiler is to convert the code to the target cpu’s instruction set. Those who worked with C compilers in early 1990s would know that there were several C compilers for DOS, Windows, Mac and *nix etc. and programs had to be compiled in each system individually.

Before the compilers output the executable code, they create something called an abstract syntax tree (AST) – which is a hierarchical representation of the code.

Take a simple example:

c = a + b

would be converted as

[equals]
|
[c]
|
[plus]
/ \
[a] [b]

So thats the job of compiler (technically, it is the parser tool): Take a human readable, well understood mathematical expression and convert into a semantic graph, which we really dont care much about.

A few days before I was having this huge xml in front of me which I had to convert to flatfile. For xml conversions, generally, using xslt seems to be the most popular solution. But as I started to write the xsl, I was reminded about the college days reading about compilers, semantic graphs and syntax trees. That’s when I felt like I borrowed an elephant to move my chairs around. And for a moment I felt like a jack-ass. What is xslt making me do? It is making me write some complex hierarchical code, which is very similar to the AST ! But isn’t that the job of the compiler?

So as a programmer, I have to give up the most intuitive way of representing logic and start writing code as hierarchical syntax tree. If compilers were alive, they would either appreciate my generosity or make fun of me for doing half their work.

The cases above illustrates amply that xslt is a superficial construct, end to end. In retrospect, there was absolutely no necessity for xslt to respect the symantics of xml itself. It could have been a simple set of instructions. Worse than this is the CAML query syntax, where simple SQL DSL is converted into complex hierarchy based conditions. The complexity increases exponentially when more conditions are added. In other words, I feel these have been design to specifically kill productivity.

It is easier to think of data models as hierarchical structures, but it is much harder to think of program instruction sets in terms of hierarchies. In data structures we are concerned about relations between entities, while programming is about flow of logic, not hierarchy of statements. So thats the mantra – when you use xslt or xml to write code next time, be aware that the compilers are having a party at your expense.

And thats when I snapped and switched to a simple Groovy script to convert the flatfile using MarkupBuilders, which got the job done about an order faster.