A comparison of top-down and bottom-up parsers

What is the difference between top-down and bottom-up parsing? What are the benefits of the two when developing a compiler? What are the disadvantages?

Solution Preview

The primary difference between top-down and a bottom-up parsers is the way that the parser matches the input tokens to the grammar used to specify the language that is being parsed. It is best to illustrate this using a sample grammar and show the different ways this is parsed
with both types of parsers.

The sample grammar we'll use is for matching a list of identifiers.
The grammar, in BNF form, looks like this (enclosed in <> means a non-terminal):

id_list_start => ID <id_list>
id_list => , ID <id_list> |
;

The BNF shows that an <id_list> starts with an "ID" and is followed by 0 or more ", ID" elements and ends with a ";". We will look at the way that the two different parser approaches parse the input "id1, id2, id3;".

A top-down parser takes the first input token ("id1" which is an ID), and finds the rules that have a production (right hand side) that starts with this token. In this case the <id_list_start> rule production starts with ID so we know that we are doing to start trying
to complete that rule. The rest of the production is the non-terminal <id_list>. The parser reads the next token which is (","). It looks at all of the options for <id_list> and finds the production ", ID <id_list>" that matches. It reads the token "id2" and that also matches the ID in that same production. Then the non-terminal <id_list> is encountered again. This process repeats until the parser reads the token ";". At that point it chooses the second production for <id_list>, ";". Since there are no non-terminals in that production and the input matches, the parse completes successfully.

Notice that this approach was "top-down" in terms of a parse tree. We started at the "top" of the grammar and matched tokens going "down" the grammar rules.

A bottom-up parser is quite different. This type of parser is going to continue reading input tokens until it can complete an entire right hand side for any rule. Then it will substitute those tokens with the associated left hand side. The process continues until all of the
tokens have been read in.

For the example input, "id1, id2, id3;", the parser processes as follows. The first ID token is read in ...

Solution Summary

This solution explains the difference between top-down and bottom-up parsers. It also discusses the situations in which one is better than the other.