SI413 Class 10: SLR Parsing

Using the deterministic Characteristic Finite State Machine (the full one with state numbers AND symbols on the stack) given
below for the grammar:

S -> E
E -> E + T | T
T -> n

write down the sequence of stacks produced by the SLR(0)
parsing algorithm on input n+n+n, and

write down the sequence of stacks produced by the SLR(0)
parsing algorithm on input n+nn+n up to the
point at which the algorithm determines that the input it
invalid.

In the previous part, how does the algorithm determine
that the input is indeed invalid?

Getting a handle on the stack in bottom up parsing

A rule with the • in it is called an "LR item". Think
of them as a goal: E→•T means "I'm trying to read
E by reading T", E→E+•T means "I'm trying
to read E by reading E+T and I've already read E+".
Each transisition represents moving to a subgoal that helps
achive the previous goal in a way consistent with the grammar.

Parsing with the non-deterministic
characteristic finite state machine

This machine we build this way is the non-deterministic
"Characteristic Finite State Machine" (CFSM). We can
convert to an equivalent deterministic machine, and that is
what is usually meant by CFSM.

SLR Parsing

This procedure is called "Simple LR" parsing, or SLR.
You should notice that in our example we can parse any
string without any lookahead, therefore this grammar is
SLR(0), i.e. SLR parsing with zero lookahead tokens
suffices.

SLR(1) Parsing

More complicated grammars may not be SLR(0), so we may need
lookahead. In fact, if we add multiplication and the usual
prcedence, we get the grammar

S → E
E → E + T | T
T → T * F | F
F → n | (E)

which produces a deterministic CSFM that includes the
following fragment:

Now, if we end up in the left state, we have some ambiguity:
LR item T→T•*F tells us to shift (push a
token), while LR item E→T• tells us to reduce T
to E. This is a "shift/reduce conflict". With lookahead,
however, we can resolve this. If the lookahead symbol is a *,
we know we must reduce. Why? Because * can't follow an "E"
in this grammar, so we really can't do the reduce and
succeed. In contrast, * is exactly what LR item
T→T•*F wants to read next. So we shift.

In other scenarios we might have two different LR items in
the same state that tell us to reduce but, of course, using
different reductions. This is a "reduce/reduce conflict".
Sometimes lookahead can disambiguate this kind of conflict,
but not in SLR parsers.