Scala parser combinators: getting a stackoverflow with packratparsers

Question

The problem

The problem is that you are indeed filling the stack. Your expression consists of 500 opening brackets, the "1+1" and then 500 closing brackets. In the terms of your grammar you have 500 terms of type "factor" nested into each other, and then inside one term of type "expr".

For each start of a (nested) term, the parser has to push something on a stack (in this case a function call). When the nested term is finished, the parsed pops this thing from the stack (in this case: the function returns). If after the last token the stack is empty and if the stack never goes negative (pops too much), then your term is well formed (in your case: the parentheses are balanced).

In simple terms: The parser uses the stack to count if the parenthesis are balanced.

You are using several tools too speed the parsing up. None of these tools helps with the stack consumption.

Your helpers:

Using a packrat parser

The packrat parser caches already parsed parts, so they don't need to be parsed again. This can come with a nice speed up, when your grammar has lots of alternatives with common parts. It doesn't help in your case, because you ahve no alternatives. And it doesn't help with the stack consumption.

Omitting parts of the result

You use <~ and ~> to omit some parts of the parsed result. But this only helps inside a term. When the according rule is invoked, the parser must still push something on the stack.

Using tokens

You break up the input stream in tokens before you parse it. This usually speeds up the parsing, because tokenizing (non recursive, usually regular expressions) is much cheaper than parsing (recursive). But in your case, the problem lies in the depth of the nested terms (recursive problem). So your pressure is completely in the parsing part, and hence uses up the stack. Tokenizing does not help with that problem.

Solution

I don't think you can solve this easily. Parser have to use some kind of stack data structure. And usually the built in stack is used for performance issues. You would have to use some stack structure on the heap. This will usually be much slower and will not be compatible with Scala Parser Combinators.

You could try to wrap "factor" in continuation-passing style (CPS). Then the calls will be tracked on the heap and not on the stack.

Out of the box there is no simple solution for this.