Is it logical for several parser reduction functions to get called along the way of reducing one expression?

StackOverflow https://stackoverflow.com/questions/21809408

  •  12-10-2022
  •  | 
  •  

سؤال

I wrote a parser that analyzes code and reduces it as would by lex and yacc (pretty much.)

I am wondering about one aspect of the matter. If I have a set of rules such as the following:

unary: IDENTIFIER
     | IDENTIFIER '(' expr_list ')'

The first rule with just an IDENTIFIER can be reduced as soon as an identifier is found. However, the second rule can only be reduced if the input also includes a valid list of expressions written between parenthesis.

How is a parser expected to work in a case like this one?

If I reduce the first identifier immediately, I can keep the result and throw it away if I realize that the second rule does match. If the second rule does not match, then I can return the result of the early reduction.

This also means that both reduction functions are going to be called if the second rule applies.

Are we instead expected to hold on the early reduction and apply it only if the second, longer rule applies?


For those interested, I put a more complete version of my parser grammar in this answer: https://codereview.stackexchange.com/questions/41769/numeric-expression-parser-calculator/41786#41786

هل كانت مفيدة؟

المحلول

Bottom-up parsers (like bison and yacc) do not reduce until they reach the end of the production. They don't have to guess which reduction they will use until they need it. So it's really not an issue to have two productions with the same prefix. In this sense, the algorithm is radically different from a top-down algorithm, used in recursive descent parsing, for example.

In order for the fragment which you provide to be parseable by an LALR(1) parser-generator -- that is, a bottom-up parser with the ability to examine only one (1) token beyond the end of a production -- the grammar must be such that there is no place in which unary might be followed by (. As long as that is true, the fact that the parser can see a ( is sufficient to prevent the unit reduction unary: IDENTIFIER from happening in a context in which it should reduce with the other unary production.

(That's a bit of an oversimplification, but I don't think that it would be correct to reproduce a standard text on LALR parsing here on SO.)

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top