Introduction

Hello, and welcome to part two of this language implementation series. In part one, a language most simple was defined in an attempt to introduce a method to parse source code into an Abstract Syntax Tree.

This article assumes that you have read part one so that we can push on with some more difficult concepts, specifically the introduction of variables, if statements and loops, and binary operators (+, -, *, /). Once again, we will be converting the source code into JavaScript, so that you can try executing it in your browser. In part one, the created language controlled a rather useless JavaScript image effect. In this article, such frivolities are dumped in favour of a straight business problem.

The Sample Program

In one of my previous jobs, it seems we were implementing e-commerce websites every other month. In my opinion, the biggest challenge was always calculating the freight cost of an order, especially when multiple items of different sizes and weights were being sent, mixed in with order-regions to affect price. Just when you thought the calculations were pretty good, one order would come along causing a ridiculous freight cost to be calculated. This meant that the code to calculate costs could be complicated, and changed often. What if instead, the rules for calculating the freight cost could be edited directly by the owner of the site?

In order to write this business-rule engine, we need to understand the domain objects, and decide what data will be available for the language. An Order object has a delivery region, customer type ("normal" or "VIP"), and a list of OrderItems. An OrderItem has a unit price, quantity, and weight:

An example of a freight policy could be to charge a fixed rate per KG, where the rate depends on the delivery region. VIP customers get a 30% freight discount. The language FL (Freight-Language) will allow this by having conditional statements, loops, variables, and access to the order data. The following section describes the language in English.

The Language Definition

An FL program is made up of 0 or more statements, and must end by declaring the total cost with the line "Freight cost is x", where x is a number or variable. In the statements, variables can be declared and set (e.g., "set price to 0"), conditional statements can be used (e.g., "if region is "asia" [", followed by 0 or more statements, and ending with "]"), and it is possible to loop through each item in the order (e.g., "loop through order [", then statements where keywords "weight" and "quantity" can be used, followed by "]"). Every statement ends in a semi-colon.

Here is an example:

Set pricePerKG to 2.5;
If region is "asia" [
set pricePerKG to 2.2;
]
Set totalWeight to 0;
loop through order [
Set totalWeight to totalWeight + (quantity * weight);
]
Set freightCost to totalWeight * pricePerKG;
if customer is "VIP" [
set freightCost to freightCost * 0.7;
]
Freight cost is freightCost;

A few of the things which can be achieved in this language is determining the minimum and maximum weights in the order (the < and > operators are also allowed), and nested loops and conditional statements are allowed to any depth. As ever, we need to define this language formally, and a slightly non-standard version of Backus-Naur Form (BNF) will be used as an interim step before defining this language using Irony:

There are many ways to define a grammar, and often a bit of intuition is used, which means it gets a little easier each time you do it. The following points can be said about this definition:

A Statement can be an IfStatement, which can itself contain Statements. It is this recursive declaration which allows the nesting of conditional statements and loops to any level with little difficultly.

Boolean expressions (e.g., "i < 4") and number expressions (e.g., "i + 4") are treated in the same way. This has been done to keep this language simple for the purposes of this article; however, it does allow some nonsensical statements (e.g., "if i + 4 then []"). An alternative would be to declare an Expression as a NumberExpression or a BooleanExpression.

A Statement can be just an expression. While perhaps not required in this language, this is a very common feature in languages (e.g., a method call with a return value can be used as an expression or, when not capturing the returned value, a statement).

Defining the Language in Irony

As was illustrated in part one, the transition from BNF to an Irony compiler is not very difficult. It involves creating a class that extends Irony.Compiler.Grammar, and creating a constructor to declare each of the terminals, and then defining the grammar:

For the sake of brevity, quite a few pieces were left out in that code snippet. While you should see the attached source code for the full version, I would like to point cast your attention to the following points:

All variables (including the globals "region", "customer", "weight", and "quantity") are declared as instances of the Irony-included IdentifierTerminal class. This covers most alphanumeric variable names, and provides a few customisation options. You can also define completely new variables if, for example, your language dictated that each variable name was enclosed in '$' signs. The keywords of the language which could be misinterpreted as variable names (e.g., "loop", "if", etc.) are specified on the instance of the terminal.

The actual values to the region and customer terminals can ("asia", "vip", etc.) use the Irony-provided StringLiteral class, which provides many options for defining how strings are defined in your language (for example, instead of enclosing in quotes, you may wish to enclose all strings in backslashes).

Parsing the Source Code

The source code can be parsed into an Abstract Syntax Tree with the following code:

Assuming the source code is valid, program is now the root node of a tree which can be traversed using the ChildNodes property. In order to visualise this, cast your eyes over the following simple program, which sets the freight cost to $5, unless the delivery is for Africa, in which case a $1 discount is given:

Set total to 5;
If region is "Africa" [
Set total to 4;
]
Freight cost is total;

Because we are ignoring irrelevant terminals such as "[" and "]" by defining those terminals as punctuation, the above program will be parsed into the following Abstract Syntax Tree:

While that may appear complicated, each part of the diagram is rather simple. You can see that a Program is a StatementList and a FreightDeclaration; the expression of the IfStatement includes a BinaryOperator (the equality keyword "is") to compare two values; etc.

Getting to the point where your language can be parsed into an abstract syntax tree is a major milestone. Now all we need to do is generate code from this thing!

Generating Code

In part one of this series, the language was so simple that we could simply loop through the nodes and extract useful information from the tree. Now that we've got recursive loops and the language is a little more complex, it is going to be much more elegant and maintainable if we create a class for each node in the tree, and have each node in the tree take responsibility for its own little piece of code.

As an example of the type of work that each node needs to do, take a look at the IfStatement node in the diagram above. How can this node generate JavaScript for itself and all of its child nodes? It's quite simple really: it needs to create a string with the text "if (" then let the first child node add its own code to this string, then concatenate ") {", then let the second node, which is a StatementList, to add its code, then finish by adding "}". And how does a StatementList generate code? Well, each child node is a Statement node, so it just needs to loop through each child, and get each Statement to generate its own code. Hopefully, you can see that an IfStatement nested within another IfStatement will automatically generate the correct, nested JavaScript code. Deeply nested FL code can be fed into the compiler, and correct, complex code will be generated from very simple code within each tree node. This, for me, is a true example of elegant code.

So, for each non-terminal, we will create a class which extends Irony.Compiler.AstNode, with properties exposing each of its child nodes:

We need each node to have a method to generate code from. One way would be to have a method defined in a base class or an interface which takes in a StringBuilder and writes code to it. An interface will do:

You must take a little time to think about in which context the generated code will be run in. In this example, JavaScript is being generated, and so it makes sense for a program to be converted to a JavaScript function. The function will have parameters such as region and items that the code can use. The ProgramNode will generate the function declaration, and then get its ChildNodes to add their own code to the function. With that in mind, here is the code generation function for OrderLoopNode:

Just as the C# compiler generates some strange class names behind the scenes, __i is used for the counter in order to not clash with the user's own variable names. The variables global and weight are not garbled as the point of their existence is to allow the user's code to access these values. Note that although the grammar allows nested loops, the generated JavaScript would not work only because you would have re-declarations of the __i, weight, and quantity variables. It would be possible to generate JavaScript in such a way to avoid this problem, however that is outside the scope of this article (note though that the IfStatementNode will successfully create nested JavaScript).

An Expression may contain non-terminals and terminals, and so this must be handled in its own way. It basically checks whether or not the child node(s) implement IJavaScriptGenerator or not. If they don't, then the text is added to the program verbatim. This works because all terminals in FL (the variables, strings, and numbers) are exactly the same in FL and JavaScript. If a string in the source language was different from a string in the target language, then some transformation would need to take place.

So, we've got these classes which should generate code when they are part of the Abstract Syntax Tree, but you may well be wondering how these classes come to be part of the tree in the first place. Well, the answer lies in the non-terminal declarations in the FLGrammar constructor. Here is the IfStatement declaration:

var ifStatement = new NonTerminal("ifStatement");

It turns out we can specify the type of class to use in the tree for this non-terminal by specifying the type in an overloaded NonTerminal constructor:

Assuming valid input, the js variable now contains a JavaScript function that returns the freight cost for a given input. This function is then included in the page and run when necessary. See FLCompiler.js for the full code, including the error handling, and see Default.aspx to see the code that calls the generated function.

You can try writing your own FL programs, seeing the generated JavaScript, and executing that JavaScript on the demo page.

An Exercise for the Reader

As pointed out earlier in the article, even a well thought-out algorithm can have strange results with certain inputs. Assuming that the source code was input in a textbox in some administration area, an excellent idea would be to look at the generated Abstract Syntax Tree and determine which execution paths are possible, and then generate test data on the fly and show examples of freight costs for different orders, in order to give confidence that the algorithm makes sense. For example, if there is a statement such as if region is "asia", then have two sets of input: one for Asia, and one for the rest of the world. While this is outside the scope of this article, it obviously involves traversing the generated tree and looking at each node (this is what the C# compiler is doing when it warns that there is unreachable code, etc.).

Adding this extra level of intelligence could be the difference between a really annoying FL language that is more trouble than it's worth and an amazingly useful tool which can prevent ridiculous business rules seeing the light of day.

Summary

I hope these two articles have shown you that creating a simple language is really quite simple (a complicated language can still be complicated, though). In particular, I hope this article has made the following points:

Allowing nested code is achieved by recursively defining non-terminals (e.g., when a non-terminal which is a type of statement that contains child nodes which may contain statements).

Giving each type of node the responsibility to generate code for only the area that is concerned with makes the generation of code, even deeply nested, spaghetti code, relatively simple.

Irony provides a lot of help, for example, by providing pre-defined terminals such as string literals. There are still many features of Irony that I haven't touched on, and you should always have a look at the Irony source code and forums on CodePlex to see if Irony has already provided the functionality you need.

But wait a minute, I know what you're thinking...

"Actually, this entire article was useless"

Well, you may have a point there. Under what circumstances would you want to execute freight cost calculations in JavaScript? The fact that this was an example for illustration purposes notwithstanding, there are many cases where rather than generating code, you just want to execute it. In other words, you want to write your first interpreter. Oh dear, I sense another article coming on...

Comments and Discussions

Hello
first, thank you for very helpful article about DSL and use of irony.
I am new in Irony and as I was practicing to learn irony better. I have come to a position that I couldn't solve.
In my example, I am calculating mathematical shapes like Area and volume. for circle I only require one number (radius) but for rectangle or square I require 2 or 3 numbers.
How can I make sure that when I choose circle as shape then program asks for one number (radius) and when choose rectangle or square the program asks for 2 or 3 numbers (height, width, length)

Hi
this is Roman, Irony's coordinator. Daniel asked me to answer your question, so here I am.
The mechanism for creating nodes in Irony changed sinse this article was written. Now you have to implement Init method of IAstNodeInit interface for initialization of the node, primarily for setting links to previously created child AST nodes of the current node. Look at nodes in Irony.Ast namespace. Let's continue on codeplex
Roman

As for your other question, you could probably do it either way. It's hard to say really - you just need to imagine which would be the clearer, more straight-forward way to do it and then try it out I suppose. Good luck!

I concur with Christian! I went hunting for the interpreter article only to come up disappointed

But, thank you for this article, it was an excellent introduction to domain specific language creation for someone who was clueless. Irony looks like an awesome tool, espcially for someone already versed in C# development.

I was disappointed to see that Irony recently lost support from its primary (sole) contributor on codeplex. I would hate to see it stagnate after just discovering that it existed