Bottleneck in math parser Haskell

https://stackoverflow.com/questions/20432596

29-08-2022
|

Question

I got this code below from the wiki books page here. It parses math expressions, and it works very well for the code I'm working on. Although there is one problem, when I start to add layers of brackets to my expression the program slows down dramatically, crashing my computer at some point. It has something to do with the number of operators I have it check for, the more operators I have the less brackets I can parse. Is there anyway to get around or fix this bottleneck?

Any help is much appreciated.

import Text.ParserCombinators.ReadP

-- slower
operators = [("Equality",'='),("Sum",'+'), ("Product",'*'), ("Division",'/'), ("Power",'^')]
-- faster
-- ~ operators = [("Sum",'+'), ("Product",'*'), ("Power",'^')]

skipWhitespace = do 
                 many (choice (map char [' ','\n']))
                 return ()

brackets p = do 
             skipWhitespace
             char '('
             r <- p
             skipWhitespace
             char ')'
             return r

data Tree op = Apply (Tree op) (Tree op) | Branch op (Tree op) (Tree op) | Leaf String deriving Show

leaf = chainl1 (brackets tree
              +++ do
                     skipWhitespace
                     s <- many1 (choice (map char "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789.-[]" ))
                     return (Leaf s))
             (return Apply)
tree = foldr (\(op,name) p ->
             let this = p +++ do 
                                 a <- p +++ brackets tree
                                 skipWhitespace
                                 char name
                                 b <- this
                                 return (Branch op a b)
              in this)
           (leaf +++ brackets tree)
           operators


readA str = fst $ last $ readP_to_S tree str 


main = do loop

loop = do 
    -- ~ try this
    -- ~ (a+b+(c*d))
    str <- getLine
    print $ last $ readP_to_S tree str
    loop

Solution

This is a classic problem in backtracking (or parallel parsing, they are basically the same thing).... Backtracking grows (at worst) exponentially with the size of the input, so the time to parse something can suddenly explode. In practice backtracking works OK in language parsing for most input, but explodes with recursive infix operator notation. You can see why by considering how many possibile ways this could be parsed (using made up & and % operators):

a & b % c & d

could be parsed as

a & (b % (c & d))
a & ((b % c) & d)
(a & (b % c)) & d
((a & b) % c) & d

This grows like 2^(n-1). The solution to this is to add some operator precidence information earlier in the parse, and throw away all but the sensible cases.... You will need an extra stack to hold pending operators, but you can always go through infix operator expressions in O(1).

LR parsers like yacc do this for you.... With a parser combinator you need to do it by hand. In parsec, there is a Expr package with a buildExpressionParser function that builds this for you.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow