Parsing parenthesized expressions

https://stackoverflow.com/questions/9240079

f#
fparsec

28-04-2021
|

Question

I have the following fsyacc grammar for (a slightly modified form of) SQL search conditions:

scalar_expr:
    | ID                                    { Identifier($1) }
    | constant                              { Constant($1) }
    | unary_op scalar_expr                  { Unary($1, $2) }
    | scalar_expr binary_op scalar_expr     { Binary($2, $1, $3) }
    | LPAREN scalar_expr RPAREN             { $2 }

search_condition:
    | search_condition OR search_condition  { Or($1, $3) }
    | search_condition AND search_condition { And($1, $3) }
    | scalar_expr comparison scalar_expr    { Comparison($2, $1, $3) }
    | LPAREN search_condition RPAREN        { $2 }

I've re-implemented it in FParsec (with some help from a previous question). Here are the relevant bits:

let binOpp = OperatorPrecedenceParser()
let scalarExpr = binOpp.ExpressionParser
binOpp.TermParser <- 
  [ constant 
    id
    between lparen rparen scalarExpr ]
  |> choice

// binary/unary ops added here

let comparison = 
  let compareExpr = pipe3 scalarExpr compareOp scalarExpr (fun l op r -> Comparison(op, l, r))
  between lparen rparen compareExpr <|> compareExpr

let andTerm = stringCIReturn "and" (fun l r -> And(l, r)) .>> ws
let orTerm = stringCIReturn "or" (fun l r -> Or(l, r)) .>> ws

let searchCondition, searchConditionRef = createParserForwardedToRef()
searchConditionRef:= 
  chainl1 comparison (andTerm <|> orTerm)        
  <|> between lparen rparen searchCondition

This parses 1 = 1 or 2 = 2, but wrapping a constant or the entire search condition in parens causes it to fail (oddly, wrapping a comparison in parens works). Here's an example that fails:

Error in Ln: 1 Col: 8
(1 = 1 or 2 = 2)
       ^
Expecting: infix operator or ')'
: 8

A scalar, comparison, and search condition may all start out similarly (open paren -> constant -> infix operator), but are essentially differentiated by the type of operator eventually encountered. For example, if you hit or you know the opening paren belongs to the entire condition and not the comparison on the left-hand side. Is this properly handled by backtracking? If so, how would you fail--while parsing a complex expression--in such a way that doesn't consume any input?

Handling the optional parentheses for scalars, comparisons, and search conditions is handled by left recursion in the fsyacc grammar. I understand this needs to be factored out in FParsec. But from the above error, I can't imagine how to get away from extensive backtracking regardless.

Solution

Meta: Why doesn't the FParsec tag work on this question?

I'll quote myself from the comments on the page for the previous question:

The nested but not mutally recursive expression grammars make the parens parsing a bit nasty here. The problem is that when the parser sees an opening bracket at certain locations, it doesn't yet know whether the parenthesized expression needs to be parsed as a scalarExpr, comparison or searchCondition. To be able to parse such expressions, you have to introduce some limited backtracking for parser errors after opening brackets and before closing brackets, so that the parser can tentatively parse a parenthesized expression with one subgrammar and, if necessary, parse again with a different grammar.

let tryBetweenParens p = lparen >>? (p .>>? rparen)

let opp = OperatorPrecedenceParser<_,_,_>()
let scalarExpr = opp.ExpressionParser
opp.TermParser <- choice [constant; id; tryBetweenParens scalarExpr]

// ... 

let comparison = // doesn't currently allow chained comparisons ( e.g. 1 = 2 = 3)
    let compareExpr = pipe3 scalarExpr compareOp scalarExpr (fun l op r -> Comparison(op, l, r))
    compareExpr <|> tryBetweenParens compareExpr

// ...

let searchCondition, searchConditionRef = createParserForwardedToRef()
do searchConditionRef:= 
    chainl1 (comparison <|> between lparen rparen searchCondition) 
            (andTerm <|> orTerm)

The complete code is available at http://pastebin.com/i7JFJWJE

With the usual expression grammars, where any parenthesized (top-level) expression is valid everywhere where a leaf term is valid, parsing is obviously simpler, because you just need to deal with parens at one place in the grammar. This is another argument for just using a single OperatorPrecedenceParser, as Stephen Swensen suggested. However, you'll have to annotate the AST with source locations if you want to be able to generate good error messages after parsing.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow