Question

I've been writing a little monadic parser-combinator library in F# (somewhat similar to FParsec) and now tried to implement a small parser for a programming language.

I first implemented the code in Haskell (with Parsec) which ran perfectly well. The parsers for infix expressions are designed mutually recursive.

parseInfixOp :: Parser String -> Parser Expression -> Parser Expression
parseInfixOp operatorParser subParser = ignoreSpaces $ do
                                          x <- ignoreSpaces $ subParser
                                          do
                                            op <- ignoreSpaces $ operatorParser
                                            y  <- parseInfixOp operatorParser subParser
                                            return $ BinaryOp op x y
                                           <|> return x

parseInfix :: [String] -> Parser Expression -> Parser Expression
parseInfix list = parseInfixOp (foldl1 (<|>) $ map string list)

parseExpr :: Parser Expression
parseExpr = parseInfix0

parseInfix0 = parseInfix ["==", "<>", "And", "Or", "Xor", "<", ">", "<=", ">="] parseInfix1
parseInfix1 = parseInfix ["+", "-", "Mod"] parseInfix2
parseInfix2 = parseInfix ["*", "/", "\\"] parseInfix3
parseInfix3 = parseInfix ["^"] parseInfix4
parseInfix4 = parseFactor

parseFactor :: Parser Expression
parseFactor = parseFactor' <|> (betweenChars '(' ')' parseExpr)

parseFactor' :: Parser Expression
parseFactor' = parseString
           <|> parseBool
           <|> parseNumber
           <|> parseVariable
           <|> (try parseFunCall) <|> parseIdentifier  

Since the order of functions doesn't matter and Haskell is evaluating in a non-strict way, this is OK, but F# is evaluating strictly.

let rec parseExpr = parseInfix0
and parseFactor = (parseFactor') <|> (betweenChars '(' ')' parseExpr)
and parseInfix2 = parseInfix ["^"] parseFactor BinaryOp
and parseInfix1 = parseInfix ["*"; "/"] parseInfix2 BinaryOp
and parseInfix0 = parseInfix ["+"; "-"] parseInfix1 BinaryOp
and parseFunCall = parser {
        let! first = letter
        let! rest = many (letter <|> digit)
        let  funcName = toStr $ first::rest

        do! ignoreSpace
        let! args = betweenChars '(' ')' $ sepBy (parseExpr) ","

        return FunCall(funcName, args)
    }
and parseFactor' =
    parseNumber
<|> parseString
<|> parseBool
<|> parseFunCall
<|> parseIdentifier

F# now either complains about recursive objects and just throws a StackOverflowException at runtime due to an infinite loop or it even doesn't compile the source because "a value would be part of its own definion".

What's the best way to prevent this errors. The debugger advices me to make use functions or lazys instead but what should I make lazy here?

Was it helpful?

Solution

What is the warning about recursive objects? Show the text; there's one such warning that is ignorable (indeed, in a sense desirable) for this case.

If it doesn't compile because of recursive values, you simply need to turn the 'syntactic values' into 'syntactic functions'. That is, rather than

...
and parseInfix2 = body
...

use

...
and parseInfix2 x = (body) x
...

even though the type of 'parseInfix2' is the same function type either way... F# (unlike Haskell) will sometimes require you to be explicit (do eta-conversion as above).

I'd ignore suggestions about inserting 'lazy', parsers are indeed functions, not values, so the eta-conversion will cover the same issue (none of this will be evaluated eagerly at all, it all needs to 'wait' until you pass the string to be parsed before anything starts to 'run').

Regarding StackOverflowExceptions, if you post a looping snippet of the stack it may help, but you maybe can see for yourself e.g. if you have a left-recursive portion of the grammar that doesn't consume any input and gets caught in a loop. I think that's an easy pitfall for most parsing technologies in most languages.

OTHER TIPS

η-conversion is not necessarily a great solution - if you do this, you'll have to prove that the delayed function is run at most once, or pay a lot of overhead for calling it during parsing.

You want something like this:

let rec p1 = lazy (...)
and p2 = ... p1.Value ..

A better way, if you have a workflow builder, is to define the Delay member to do this for you:

type Builder() =
    member this.Delay(f: unit -> Parser<_>) : Parser<_> = 
        let promise = Lazy.Create f
        Return () |> Bind (fun () -> promise.Value)

let parse = new Builder()


let rec p1 =
    parse {
        ...
    }

and p2 =
    parse {
        ...
    }

Neither the eta rewrite nor the lazy delaying is a sure thing. The F# compiler seems to have a hard time dealing with deep recursions. What worked for me was to collapse the recursion into a single top level function (by passing the function to be called recursively as an argument). This top level is written eta style.

Top level, I have:

let rec myParser s = (parseExpr myParser) s

Note wikipedia says: "In a strict language like OCaml, we can avoid the infinite recursion problem by forcing the use of a closure. ". That is what worked for me.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top