Limitations of LL vs LR parsers?

https://stackoverflow.com/questions/5467244

14-11-2019
|

Question

I know the basic differences of LL vs LR parsers. I also know that GLR, SLR, and LALR are all extensions of LR parsers. So my question in more details is...

Given a LL(*) parser and any variation on a LR parser, is there any language that can be described in one and not the other? Or more simply is there any feature or property that can not be expressed with either?

As a concrete example. If I were to create a language using an LL(*) parser, will I ever run into desired feature/property that I might want to add to my language that would only be possible with a LR parser (or vice versa)?

Solution

Sure. LL parsers cannot handle any grammars with left recursion. L(AL)R(k) parsers for fixed k won't be able to parse some things that an LL(*) parser can handle, because k<*.

OTHER TIPS

Here are a couple of opinion pieces, you can consider them point and counterpoint:

Parsing ought to be easier - in favor of LL
Why I prefer LALR parsers - in favor of LR

You might find interesting this paragraph in Wikipedia, which says, that LL(*) grammars are subset of LR(k) grammars: http://en.wikipedia.org/wiki/Context-free_grammar#Restrictions So you can parse more languages using LR parsing methods.

There are some grammars that cannot be "rewritten" to be parsed by LL parsers which can be parsed by LR parsers. One example: Given a simple grammar that constructs terms with substractions:

S -> S - S | num

You obviously have left-recursion here, which cannot be handled by LL-parsers. To make this grammar parsable by LL, you have to eliminate the left-recursion:

S -> num S'

S' -> - num S' | epsilon

Now your LL parser can handle this grammar. But when building up your syntax tree for a term like 4 - 2 -1, a depth-first-search operated on the tree will give you 4 - (2 - 1) = 3 instead of (4 - 2) - 1 = 3 as you would expect.

The reason for this is, that you have to use left-recursive rules in your grammar to handle left-associative operators (like substract). But left-recursive rules cannot be handled by LL-parsers.

So here you have a class of languages that cannot be handled by LL.

LR parser can accept bigger class of languages than LL. In LL(k) and LR(k), k means number of lookahead symbols that it needs to know so it can apply appropriate production/reduction. The bigger k get, the bigger are parsing tables. So k is not limiting only LR parsers, it is limiting LL parsers, too. The reason why LR parser can accept bigger class of languages is because of left recursion that are problematic when working with LL parsers. But this is not true entirely, because direct recursion are solvable, meaning that you can rewrite the grammar to be LL grammar. Direct recursion is something like A -> Abc. When you have indirect recursion, for which you now probably know how it looks, then you have a problem. LR parsers can solve this issues because they make parse tree in bottom-up manner. You will have to look little bit deeper into LR parsing to see why is that exactly. But, LR parsers are not all mighty, they have limitations, too. Some grammars are very difficult to digest and k factor does not help. For this kind of grammars GLR parser is needed, which actually simulates LR(k) parser but uses backtracking to analyze entire parse space when production/reduction ambiguity occurs.

LL parsing is theoretically O(n^4), or pretty dang slow. LR parsing is faster, O(n^3), or pretty slow. https://en.wikipedia.org/wiki/Top-down_parsing

Although I'd love to see proof of this.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow