math expression evaluator in javascript: part 2 (parser)

This is the second part of my weekly series on how to write a math expression evaluator in JavaScript. If you miss the first part, which was about lexical analysis, go and read it first and come back later.

In this installment, we will see the actual parsing process. Based on a sequence of tokens obtained from the lexer, we shall be able to match these tokens to the expression grammar which describes the syntax of the expression. At the end of the process, we want to produce a syntax tree. This entire procedure is commonly known as syntactic analysis.

An abstract syntax tree (AST) represents the syntactic structure of the expression. Consider the following expression:

x =-6*7

The associated syntax tree for the above expression should look like:

The above syntax tree matches our logic if we are supposed to evaluate the expression (if we handle every node in the tree using depth-first traversal): take 6, negate it, multiply the result with 7, put the result into x. Unless you don’t pass primary school math, your brain should magically gives -42 as the answer.

Of course, the question is how to create the tree given a list of tokens associated with the expression? There are multiple strategies to analyze the expression’s syntax. The one which I illustrate below is commonly known recursive descent parser. It may not be the fast one in some cases, but it is very illustrative and quite easy to trace should you want to follow the parsing logic.

Again, the following paragraphs explain and discuss various code fragment only (error checkings may be omitted). The full code is available for your pleasure from the TapDigit project.

The main entry point for the parsing looks like this. The lexer itself comes from the implementation of the lexical analyzer discussed in part 1.

From this, we go to the main parseExpression function which is surprisingly simple. This is because our syntax only implies a variable assignment as an expression. For other languages with more elaborate control flow (branching, loop, etc) or some form of domain-specific language (DSL), assignment may not be the only form of expression.

function parseExpression(){return parseAssignment();}

For the next subsequent parseFoo variants, we need a function which can match an operator. If the incoming operator is the same as the expected one, then it returns true.

An example form of assignment is x = 42. However, we also want to tackle the case where the expression is also as plain as 42 or a nested assignment such as x = y = 42. See if you can understand how the implementation of parseAssignment below handles all the three cases (hint: recursive is a possibility).

The function parseAdditive processes both addition and subtraction, i.e. it creates a binary operator node. There will be two child nodes, the left and the right ones. They represent the two subexpression, further handled by parseMultiplicative, to be added or subtracted. Update: the previous version of the function was copied from a wrong implementation, this has been fixed since.

Before we go and check the details of parseUnary, you may wonder why parseAdditive is first called and then parseMultiplicative. This is done in order to satisfy the operator precedence requirement. Consider the expression 2 + 4 * 10 which actually evaluates to 42 (multiply 4 by 10, then add 2) rather than 60 (add 2 to 4, multiply by 10). This is only possible if the topmost node in syntax tree is binary operator + which has two child nodes, the left one is just the number 2 and the right node is actually another binary operator *. The latter holds two numbers as the corresponding child nodes, 4 and 10.

To handle a negation, like -42, we use the concept of unary operation. In the syntax tree, this is represented by a unary operator node and it has only one child node (hence the name). While negation is one form of unary operation, there is also a case such as +42 which we need to take into account. Thanks to the recursive nature, expression like ----42 or even -+-+42 can be handled without any problem as well. The code to handle the said unary operation is as simple as the following:

Now here comes one of the most important function of all: parsePrimary. First of all, let’s see four possible forms of primary node:

an identifier (basically referring to a variable in this context), e.g. x

a number, e.g. 3.14159

a function call, e.g. sin(0)

another expression enclosed in brackets, e.g. (4 + 5)

Fortunately, deciding whether the incoming tokens will form one of the above possibilities is rather easy as we just need to examine the token type. There is only ambiguity between an identifier and a function call, which can be solved if we peek at the next token, i.e. whether it is an open bracket or not. Without further ado, here is the code:

Now the remaining part is parseFunctionCall. If we see an example of a function call like sin(0), it basically consists of a function name, open bracket, function argument, and close bracket. It is important to realize that there can be more than one arguments (foo(1, 2, 3)) or no argument at all (random()), depending on the function itself. For simplicity, we split the handling of function argument to parseArgumentList. Here are both functions for your pleasure:

Voila! That’s all our parser code. When combined properly into a functional object, it is just about 200 lines of code, supporting various math operations with proper precedences, brackets, variables, and function calls.

Related Posts:

you may be interseted in a parser tree graph . I haven’t had time to clean it up but go to the cleanup branch/tag of https://github.com/dacresni/parse-tree-grapher outputs to latech, I want to output to javascript and make that the display.

http://dacresni.github.com dacresni

you may be interseted in a parser tree graph . I haven’t had time to clean it up but go to the cleanup branch/tag of https://github.com/dacresni/parse-tree-grapher outputs to latech, I want to output to javascript and make that the display.

Masklinn

Since you’re parsing expressions (and pretty much only expressions), you should look into Pratt’s top-down operator parser.

I’m currently using one to implement an evaluator for Python expressions, and FWIW here’s the implementation for parsing the “+” operator, including prefix and infix uses:

(well that’s not completely true, the implementation of “+” actually is only 2 function calls: most operators will precisely follow the pattern above changing the values for `id` and `bp` so it’s easy to create factory functions and avoid having to type the same thing 40 times in a row).

While I played with and had to implemented different parser techniques in the past, I still believe that the recursive descent is the best for a tutorial (i.e. the main goal of this blog post). It is easy to follow and easy to debug!

Having said that, experienced developer should look at Crockford’s implementation of top down parsing. It is beautiful and compact, and it’s still my most favorite parsing code so far.

Masklinn

Since you’re parsing expressions (and pretty much only expressions), you should look into Pratt’s top-down operator parser.

I’m currently using one to implement an evaluator for Python expressions, and FWIW here’s the implementation for parsing the “+” operator, including prefix and infix uses:

(well that’s not completely true, the implementation of “+” actually is only 2 function calls: most operators will precisely follow the pattern above changing the values for `id` and `bp` so it’s easy to create factory functions and avoid having to type the same thing 40 times in a row).

While I played with and had to implemented different parser techniques in the past, I still believe that the recursive descent is the best for a tutorial (i.e. the main goal of this blog post). It is easy to follow and easy to debug!

Having said that, experienced developer should look at Crockford’s implementation of top down parsing. It is beautiful and compact, and it’s still my most favorite parsing code so far.

whoami

My name is Ariya Hidayat. These days, I promote software craftsmanship, JavaScript, HTML5, CSS3, and general web technologies. I write blog posts regularly and speak at developer events from time to time.

I am a big believer in sharing and openness. I have been involved with FOSS (free/open-source software), contributing code to projects such as KDE, Qt, and WebKit. In my little spare time, I also run projects such as PhantomJS (browser automation) and Esprima (JavaScript parser).

Notice

This blog is my personal space. Unless otherwise noted, I don't discuss activities related to my professional work. All opinions expressed in this blog are my own and do not necessarily represent the official view of my employer.

Imprint

Certain links, including hypertext links, in my blog will take you outside my blog. Links are provided for your convenience and inclusion of any link does not imply endorsement or approval of the linked site, its operator or its content. I am not responsible for the content of any website outside of my blog.

Subscribe

Enter your email address to subscribe to this blog and receive notifications of new posts by email.