Parsec: backtracking not working
-
18-09-2019 - |
Question
I am trying to parse F# type syntax. I started writing an [F]Parsec grammar and ran into problems, so I simplified the grammar down to this:
type ::= identifier | type -> type
identifier ::= [A-Za-z0-9.`]+
After running into problems with FParsec, I switched to Parsec, since I have a full chapter of a book dedicated to explaining it. My code for this grammar is
typeP = choice [identP, arrowP]
identP = do
id <- many1 (digit <|> letter <|> char '.' <|> char '`')
-- more complicated code here later
return id
arrowP = do
domain <- typeP
string "->"
range <- typeP
return $ "("++domain++" -> "++range++")"
run = parse (do t <- typeP
eof
return t) "F# type syntax"
The problem is that Parsec doesn't backtrack by default, so
> run "int"
Right "int"
-- works!
> run "int->int"
Left "F# type syntax"
unexpected "-"
expecting digit, letter, ".", "`" or end of input
-- doesn't work!
The first thing I tried was to reorder typeP:
typeP = choice [arrowP, identP]
But this just stack overflows because the grammar is left-recursive--typeP never gets to trying identP
because it keeps trying arrowP
over and over. Next I tried try
in various places, for example:
typeP = choice [try identP, arrowP]
But nothing I do seems to change the basic behaviours of (1) stack overflow or (2) non-recognition of "->" following an identifier.
My mistake is probably obvious to anybody who has successfully written a Parsec grammar. Can somebody point it out?
Solution
I think the problem is that, and I'm making an assumption for F# (because I don't know it), arrows are right associative. I'm not sure how precise the linked grammar is supposed to be, as I'm not well versed in different grammars. But if we can assume arrows are right associative that makes the problem easier.
So with that assumption we can trivially do:
identP = many1 (digit <|> letter <|> char '.' <|> char '`')
typeP = try arrowP <|> identP
arrowP = do
i <- identP
string "->"
t <- typeP
return $ "(" ++ i ++ " -> " ++ t ++ ")"
run = flip parse "F# type syntax" $ do
t <- typeP
eof
return t
So:
Haskell> run "int"
Right "int"
Haskell> run "int->int"
Right "(int -> int)"
Haskell> run "int->int->int->int"
Right "(int -> (int -> (int -> int)))"
Expanding further, what might be confusing you is that in that grammar it says type -> type, which means you could have an arrow on the left side. That's fine, but it needs to be in parentheses. Which helps, maybe seeing the following in action is helpful. It helped me.
typeP = try arrowP <|> parens typeP <|> identP
arrowP = do
i <- parens typeP <|> identP
string "->"
t <- typeP
return $ "(" ++ i ++ " -> " ++ t ++ ")"
parens p = between (char '(') (char ')') p
Now we can write arrows on the left or the right side of an arrow:
Haskell> run "int->int->int"
Right "(int -> (int -> int))"
Haskell> run "(int->int)->int"
Right "((int -> int) -> int)"
OTHER TIPS
I think you should factor the left recursion out of the grammar. Instead of
type ::= identifier | type -> type
identifier ::= [A-Za-z0-9.`]+
you get something like
typeStart ::= identifier
type ::= typeStart (-> type)?
identifier ::= [A-Za-z0-9.`]+
Then this will be easier to translate directly into parsec, I think. (One would think that try
would work, and I expect it somehow does, but yes, my experience also was that I had to be at least waist-deep in Parsec before I ever understood "where to put try
" to make things work.)
Consider seeing also Monadic Parser Combinators in F# (as well as the 7 prior C# blog entries) for some basics. I think the parsec docs (try just reading them top down, they are decent, if I recall correctly) as well as some of the examples in the various research papers talk about issues like the one in your question.
This won't help you understand where you're going wrong, but I'd suggest looking into using sepBy1
to parse types separated by ->
symbols. This will give you a list of parsed types, which you can then turn back into function types afterward.