simonh@swidev.demon.co.uk (Simon Huntington) writes:>> How does an LALR parser with regular expressions (eg. *, +, []) handle the>> attribute stack? Obviously, the attributes on the stack depend on the number>> of iterations of constructs such as *, +.

norman@flaubert.bellcore.com (Norman Ramsey) responds:> For the implementation, I avoid all your questions about shift> conflicts by rewriting all the extended BNF to ordinary> BNF. For example, I rewrite closure `(a b c {semantics})*' to
[Note: This method was pioneered by Celantino (or something close). I
don't keep that reference handy as it isn't the technique that our
parser generator uses.]

and back to Simon:>> I've also worked out how to handle optional constructs etc.., but the problem>> is that the attribute stack manipulation depends on a shift action and there>> may be other shifts on the same symbol that don't require the actions. eg.>>>> Y -> (a b)+ a>>>> when the parser reaches the end of the + construct, it can shift a and>> leave the construct (stack manipulation required), otherwise it can enter>> the iteration again (stack manipulation required) - both actions are>> different and both can't be executed.>>>> So, what do I do? Avoid grammars that cause the problem or is there an easier>> way where I won't get shift action conflicts? Would it be possible to group>> attributes into lists when the rule is finally reduced.

Because of the rest of your article, I presume that you are doing a
"direct translation" of the ELR grammar to a parser without a
rewriting step which can introduce conflicts. (In particular, the
rewritten parser will get conflicts in exactly the place where you
have trouble placing your shift action code, because you need some
tokens of lookahead to determine whether the token just shifted is
part of the iteration construct or not.)

The "solution" is to save your collecting until the production is
complete and being reduced. At that point, you know exactly which
rule was applied and can determine exact number of loops for each
iteration construct. [There are some results by Nakata and Sassa
which allow you to prove that you can always do this determination if
you handle certain ambiguities as I note below.] Not that the
determination is trivial, since your parser generator has probably
merged states along the way which obliterates some of the tracking of
which path was taken through optional clauses etc. However, you can
use the techniques you devised to handle optional clauses previously
to help you leave crumbs for your reduce code to use. There will be
cases where you can't leave crumbs because the regular expressions are
effectively ambiguous as in the next example. However waiting to the
reduce also handles some "ambiguous" rules like:

x: a+ a+; // 2 or more a's

The code which determines the iterations can simply "decide" how to
partition the list of a's between the two iterators in any fashion
which makes sense. It is eminently sensible (and straight-forward to
implement) to use the rule used in many editors where the first
iterator expands to include as much as it can in these ambiguous cases
(and effectively rewriting the ambiguous case into "x: a+ a;"). I
believe if you follow that rule, you can always determine which crumbs
you need to leave at any action point, although in many cases you end
up leaving the crumbs after you have passed beyond the ambiguity (to a
point when you know that n-tokens before you were at the end of the
iterator and not just making another loop).

As another alternative, you can simply take the contents of the
production without collecting it into distinct iterations. This
allows you to ignore the problem. In the above ambiguous example $1
would be the 1st a, $2 the 2nd, $3 the 3rd, upto $n for the n-th. The
users can then partition the productions in any way that they desire.
This is the approach we used in Yacc++(R).