Question

I have the following grammar for the setlx language in PLY:

Rule 0     S' -> file_input
Rule 1     file_input -> statement_list
Rule 2     epsilon -> <empty>
Rule 3     statement_list -> statement
Rule 4     statement_list -> statement_list statement
Rule 5     statement -> simple_statement SEMICOLON
Rule 6     statement -> compound_statement
Rule 7     simple_statement -> expression_statement
Rule 8     simple_statement -> assert_statement
Rule 9     simple_statement -> assignment_statement
Rule 10    simple_statement -> augmented_assign_statement
Rule 11    simple_statement -> backtrack_statement
Rule 12    simple_statement -> break_statement
Rule 13    simple_statement -> continue_statement
Rule 14    simple_statement -> exit_statement
Rule 15    simple_statement -> return_statement
Rule 16    simple_statement -> quantor
Rule 17    simple_statement -> term
Rule 18    expression_statement -> expression
Rule 19    backtrack_statement -> BACKTRACK
Rule 20    break_statement -> BREAK
Rule 21    continue_statement -> CONTINUE
Rule 22    exit_statement -> EXIT
Rule 23    return_statement -> RETURN
Rule 24    return_statement -> RETURN expression
Rule 25    expression_list -> expression
Rule 26    expression_list -> expression_list COMMA expression
Rule 27    expression -> implication
Rule 28    expression -> lambda_definition
Rule 29    expression -> implication EQUIVALENT implication
Rule 30    expression -> implication ANTIVALENT implication
Rule 31    implication -> disjunction
Rule 32    implication -> disjunction IMPLICATES disjunction
Rule 33    disjunction -> conjunction
Rule 34    disjunction -> disjunction OR conjunction
Rule 35    conjunction -> comparison
Rule 36    conjunction -> conjunction AND comparison
Rule 37    comparison -> sum
Rule 38    comparison -> sum EQ sum
Rule 39    comparison -> sum NEQ sum
Rule 40    comparison -> sum LT sum
Rule 41    comparison -> sum LE sum
Rule 42    comparison -> sum GT sum
Rule 43    comparison -> sum GE sum
Rule 44    comparison -> sum IN sum
Rule 45    comparison -> sum NOTIN sum
Rule 46    sum -> product
Rule 47    sum -> sum PLUS product
Rule 48    sum -> sum MINUS product
Rule 49    product -> reduce
Rule 50    product -> product TIMES reduce
Rule 51    product -> product DIVIDE reduce
Rule 52    product -> product IDIVIDE reduce
Rule 53    product -> product MOD reduce
Rule 54    product -> product CARTESIAN reduce
Rule 55    reduce -> unary_expression
Rule 56    reduce -> reduce SUM unary_expression
Rule 57    reduce -> reduce PRODUCT unary_expression
Rule 58    unary_expression -> power
Rule 59    unary_expression -> SUM unary_expression
Rule 60    unary_expression -> PRODUCT unary_expression
Rule 61    unary_expression -> HASH unary_expression
Rule 62    unary_expression -> MINUS unary_expression
Rule 63    unary_expression -> AT unary_expression
Rule 64    unary_expression -> BANG unary_expression
Rule 65    power -> primary
Rule 66    power -> primary POW unary_expression
Rule 67    primary -> atom
Rule 68    primary -> attributeref
Rule 69    primary -> subscription
Rule 70    primary -> slicing
Rule 71    primary -> procedure
Rule 72    primary -> call
Rule 73    primary -> primary BANG
Rule 74    atom -> identifier
Rule 75    atom -> literal
Rule 76    atom -> enclosure
Rule 77    identifier -> IDENTIFIER
Rule 78    identifier -> UNUSED
Rule 79    attributeref -> primary DOT identifier
Rule 80    subscription -> primary LBRACKET expression RBRACKET
Rule 81    slicing -> primary LBRACKET lower_bound RANGE upper_bound RBRACKET
Rule 82    lower_bound -> expression
Rule 83    lower_bound -> epsilon
Rule 84    upper_bound -> expression
Rule 85    upper_bound -> epsilon
Rule 86    literal -> stringliteral
Rule 87    literal -> integer
Rule 88    literal -> floatnumber
Rule 89    literal -> boolean
Rule 90    stringliteral -> STRING
Rule 91    stringliteral -> LITERAL
Rule 92    integer -> INTEGER
Rule 93    floatnumber -> DOUBLE
Rule 94    boolean -> TRUE
Rule 95    boolean -> FALSE
Rule 96    enclosure -> parenth_form
Rule 97    enclosure -> set_display
Rule 98    enclosure -> list_display
Rule 99    parenth_form -> LPAREN expression RPAREN
Rule 100   set_display -> LBRACE expression RANGE expression RBRACE
Rule 101   set_display -> LBRACE expression COMMA expression RANGE expression RBRACE
Rule 102   set_display -> LPAREN argument_list RPAREN
Rule 103   list_display -> LBRACKET expression RANGE expression RBRACKET
Rule 104   list_display -> LBRACKET expression COMMA expression RANGE expression RBRACKET
Rule 105   list_display -> LBRACKET argument_list RBRACKET
Rule 106   lambda_definition -> lambda_parameters LAMBDADEF expression
Rule 107   lambda_parameters -> identifier
Rule 108   lambda_parameters -> LT parameter_list GT
Rule 109   assignment_statement -> target ASSIGN expression
Rule 110   target -> expression
Rule 111   augmented_assign_statement -> augtarget augop expression
Rule 112   augtarget -> identifier
Rule 113   augtarget -> attributeref
Rule 114   augtarget -> subscription
Rule 115   augop -> PLUS_EQUAL
Rule 116   augop -> MINUS_EQUAL
Rule 117   augop -> TIMES_EQUAL
Rule 118   augop -> DIVIDE_EQUAL
Rule 119   augop -> IDIVIDE_EQUAL
Rule 120   augop -> MOD_EQUAL
Rule 121   assert_statement -> ASSERT LPAREN expression COMMA expression RPAREN
Rule 122   term -> TERM LPAREN term_arguments RPAREN
Rule 123   term_arguments -> expression_list
Rule 124   term_arguments -> epsilon
Rule 125   procedure -> PROCEDURE LPAREN parameter_list RPAREN LBRACE block RBRACE
Rule 126   procedure -> CPROCEDURE LPAREN parameter_list RPAREN LBRACE block RBRACE
Rule 127   parameter_list -> procedure_param
Rule 128   parameter_list -> parameter_list COMMA procedure_param
Rule 129   parameter_list -> epsilon
Rule 130   procedure_param -> identifier
Rule 131   call -> primary LPAREN argument_list RPAREN
Rule 132   call -> primary LPAREN RPAREN
Rule 133   argument_list -> expression
Rule 134   argument_list -> argument_list COMMA expression
Rule 135   quantor -> FORALL LPAREN iterator_chain PIPE expression RPAREN
Rule 136   quantor -> EXISTS LPAREN iterator_chain PIPE expression RPAREN
Rule 137   iterator -> target IN expression
Rule 138   iterator_chain -> iterator
Rule 139   iterator_chain -> iterator_chain COMMA iterator
Rule 140   compound_statement -> if_statement
Rule 141   compound_statement -> switch_statement
Rule 142   compound_statement -> match_statement
Rule 143   compound_statement -> while_loop
Rule 144   compound_statement -> do_while_loop
Rule 145   compound_statement -> for_loop
Rule 146   block -> statement_list
Rule 147   block -> epsilon
Rule 148   if_statement -> IF LPAREN expression RPAREN LBRACE block RBRACE
Rule 149   if_statement -> IF LPAREN expression RPAREN LBRACE block RBRACE ELSE LBRACE block RBRACE
Rule 150   if_statement -> IF LPAREN expression RPAREN LBRACE block RBRACE ELSE if_statement
Rule 151   switch_statement -> SWITCH LBRACE case_statements default_case RBRACE
Rule 152   case_statements -> case_list
Rule 153   case_statements -> epsilon
Rule 154   case_list -> case_statement
Rule 155   case_list -> case_list case_statement
Rule 156   case_statement -> CASE expression COLON block
Rule 157   default_case -> DEFAULT COLON block
Rule 158   default_case -> epsilon
Rule 159   match_statement -> MATCH
Rule 160   while_loop -> WHILE LPAREN expression RPAREN LBRACE block RBRACE
Rule 161   do_while_loop -> DO LBRACE block RBRACE WHILE LPAREN expression RPAREN SEMICOLON
Rule 162   for_loop -> FOR LPAREN iterator_chain RPAREN LBRACE block RBRACE

On the last few meters, I now get some conflicts:

WARNING: 
WARNING: Conflicts:
WARNING: 
WARNING: shift/reduce conflict for IN in state 34 resolved as shift
WARNING: shift/reduce conflict for COMMA in state 94 resolved as shift
WARNING: shift/reduce conflict for RPAREN in state 154 resolved as shift

How can I resolve them without generating new conflicts? I understand where they come from, but I have no idea about fixing it. Any help or general advice is appriciated.

Was it helpful?

Solution

I'll do these backwards, because that way we go from easiest to hardest. In fact, I don't really have a solution for the first conflict.

The third conflict is the result of an actual ambiguity in the grammar. You need to get rid of the ambiguity:

Rule 96    enclosure -> parenth_form
Rule 97    enclosure -> set_display
Rule 99    parenth_form -> LPAREN expression RPAREN
Rule 102   set_display -> LPAREN argument_list RPAREN
Rule 133   argument_list -> expression

Consequently, if we're looking for an enclosure and we find a simple parenthesized expression, it could either be a parenth_form or it could be a set_display containing an argument_list of exactly one expression. I suspect that the intention here is that a simple parenthesized expression would be a parenth_form, but there's no way to tell from the grammar.

The simplest solution would be to just get rid of parenth_form altogether, and check for the case of a one-element argument_list when you build the AST node for a set_display corresponding to rule 102. Another possibility is to be explicit about it; change Rule 102 to require the set_display to have at least two expressions:

set_display -> LPAREN expression COMMA argument_list RPAREN

That still requires you to juggle the AST, though, because you have to prepend the expression to the argument_list when you build the set_display node.

The second S/R conflict is actually quite similar. It arises because of:

Rule 104   list_display -> LBRACKET expression COMMA expression RANGE expression RBRACKET
Rule 105   list_display -> LBRACKET argument_list RBRACKET

So:

LBRACKET expression COMMA expression ...

would require reduction by Rule 104 if the following symbol is RANGE; by Rule 105 if the following symbol is RBRACKET; and by Rule 134 if the following symbol is COMMA. (That's a rough approximation, since it assumes that we already know the end of the second expression.) As written, though, the grammar needs to commit to one of these paths as soon as it sees the first COMMA, because it needs to decide at that moment whether to create an argument_list or not.

The solution is to delay the parser's decision, which is easy but ugly:

list_display -> LBRACKET expression RANGE expression RBRACKET
list_display -> LBRACKET expression COMMA expression RANGE expression RBRACKET
list_display -> LBRACKET expression RBRACKET
list_display -> LBRACKET expression COMMA argument_list RBRACKET

Now, the first COMMA is always shifted and the decision on what type of list_display to reduce is delayed until the end of the second expression (if there are two expressions), but it's necessary to juggle the AST for the last two productions to correct the argument_list.

The first S/R conflict arises because IN is used both as an operator and as a syntactic part of an iterator:

Rule 44    comparison -> sum IN sum
Rule 137   iterator -> target IN expression

But because target is just an expression, and expression can derive sum, it's not possible (most of the time) for the parser to know which IN it's looking at until much later in the parse.

The previous technique of deferring the decision won't work here, because you need to know which type of IN you're looking at in order to correctly apply operator precedence. Suppose we're in a context where we need an iterator and the input is:

atom1 AND atom2 IN atom3

If that's the iterator (i.e., the next symbol is COMMA or RPAREN), then that is, effectively:

( atom1 AND atom2 ) IN atom3

However, if that's the left-hand side of an iterator, then it needs to be parsed completely differently:

( atom1 AND ( atom2 IN atom3 ) ) IN expression

Moreover, atom3 could have been an arbitrary expression, perhaps atom3 AND atom4, leading to the two parses:

( atom1 AND atom2 ) IN ( atom3 AND atom4 )
( atom1 AND ( atom2 IN atom3 ) AND atom4 ) IN expression

This is why puns are bad in language design.

I strongly suspect that there is no LR(k) grammar which will be able to parse that particular corner of your language, although that's just based on intuition; I have no proof. However, a GLR parser would have no trouble with it, because it is not actually ambiguous. I don't know if there is a GLR parser generator in Python; if you're not tied to Python, you could certainly use bison.

The GLR parser would also have solved the second conflict, which is also not the result of an ambiguity.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top