Left and right recursion

We will use two different grammars, one right recursive: exp : id | id '+' exp ;
and one left recursive: exp : id | exp '+' id ;
to recognise a + b + c as a expression.

(Assume that we can recognise
and convert a and b and c to identifiers somehow - they
are only used here to make the form of the parse trees clear.)

With both grammars, the form that expression nodes of the parse tree can
take is limited - either a single identifier or else a + with an
identifier on one (specified) side and an expression on the other
(i.e. another node). Therefore we have one node for each identifier. Of the
three expression nodes this implies, one will be just an identifier, and the
other two will also contain the two +s and subexpressions.

However, it is always possible to rewrite the grammar so that YACC
can perform the calculations, although it may not always be easy or obvious.
For example, the (left-recursive) grammar used above is really something
like:

Precedence and associativity

The example calculators given in 3 give different precedence
to the operators +, -, *, /, ( and
) by using a separate grammar rule for each precedence level.

We can use left recursion for left associative operators and right recursion
for right associative operators (4.2). For
non-associative operators, we just do not use recursion at all
e.g. a == b == c is legal in C but not in Pascal (which also uses
= instead of ==) so we could use this grammar rule for Pascal:
condition : exp '=' exp ;

The problem with these methods is that they tend to complicate the grammar
and constrain language design. YACC has an alternative mechanism that
decouples the declarations of precedence and associativity from the grammar,
using %left, %right and %nonassoc. e.g.:

YACC can also give different precedences to overloaded operators;
e.g. if - is used both for negation and for subtraction and we want to
give them different precedences (and negation the same precedence as for not):

| '-' exp %prec '!'

To allow for any sort of associativity, we have had to use grammar rules in
the example that contain both left and right recursion - if it were not for
the %left etc., this grammar would be hopelessly ambiguous and so unusable.
(Disambiguation uses precedence first, then associativity, as expected.) The
converse of this is, that you must be careful how you use these facilities,
as they can conceal unexpected ambiguities in the grammar.
(6 will explore the problems of ambiguous grammars.)

As it stands, parsers generated by YACC halt at the first grammar
error. This is not very informative to users, as there may be more errors to
be found, so how can we improve error handling?

single-symbol errors

We could explicitly modify the grammar to detect likely errors:

idlist : id
| idlist , id
| idlist id {yyerror ("missing ,");}
;

bigger errors

The parser must skip input symbols and exit parse rules
until it reaches a recognizable restart point. With YACC, we can
explicitly modify the grammar to indicate where to restart parsing:

statement : variable ':=' expression
| . . .
| error
;

suggests where error recovery takes place. YACC also has actions
yyerrorok, yyclearin, YYACCEPT and YYERROR
to assist in this process.

Most modern parser-generators attempt, more or less successfully, to do both
these levels of error recovery automatically, so all we have to write is the
correct grammar. (Another reason to use a parser-generator, rather than
write parsers by hand.)

A tool like YACC can cope with most combinations of precedence,
associativity, number of operands and fix in expressions. It can even cope
with most forms of overloading.
However, YACC would have great difficulty in coping with an overloaded
operator that had different precedence, associativity, number of operands or
fix depending on its operands.
Probably because most humans would also find this hard to cope with, I know
of no such language.

Language designers can essentially ignore the difficulty of implementation
of expressions and concentrate on usability, as the average user and the
average compiler writer would probably agree about features they liked.

Design a grammar for boolean expressions, involving operators AND, OR
and NOT, brackets ( and ), and operands TRUE, FALSE and C-like identifiers.
An example of the sort of expression allowed is:
a AND (B OR Abcd_99 AND NOT (FRED OR TRUE))
You can use any grammar notation you like initially. Split your grammar into
the portions that should be dealt with by LEX and by YACC, and write the
corresponding LEX and YACC programs. What is the parse tree you expect for
the example expression above?

Modify the calculator in 3.6 to use
\%left etc. You will need to merge and modify the rules for
exp, term and factor.
See what happens when you alter the precedence and/or associativity of the
operators in various ways.

Check your favourite compilers to see how they cope with simple
grammatical errors like missing or extra {, }, , and
; symbols. Is the error reported correctly, and is it located
accurately? Are there any subsequent spurious error messages? How
complicated a grammatical error can it cope with sensibly?

Write LEX and YACC code to recognise grammars
written in BNF and/or EBNF, like the examples in 4.1.
(i.e. write a grammar that describes grammars!)