LL grammars and left-recursiviity

https://cs.stackexchange.com/questions/6809

16-10-2019
|

Question

Why LL(k) and LL(∞) are incompatible with left-recursion? I understand that a LL(k) language can support left-recursivity provided that with k-overahead tokens can be resolved any ambiguity. But, with a LL(∞) grammar, which type of ambiguities can't be solved?

Solution

The problem that $LL$ variants have with left recursion is inherent to the way $LL$ works: it is a top-down type parser, which means it replaces nonterminals by their productions.

An $LL$-style parser works as follows. It traverses the input from left to right in one go. If we are at some point in the input, then we know that everything to the left of this point is OK. For everything to the right of this point, the parser has constructed an 'approximation' of what it expects to see next. Consider for example this grammar:

1: $E \to E + E$
2: $E \to x$

Note that the grammar is not $LL$, but we can still parse inputs in $LL$-style. On input $x+x+x$, an $LL$-style parser may end up at position $x+\bullet x+x$. Let's assume it has decided that the left part, $x+$, is fine, and for the rest of the input it expects to see $x+E$. It will then find out that $x+x+$ is fine, with $E$ remaining. It may then replace this $E$ by a production, in particular production 2 above. With $x$ remaining, the parser will accept the input.

The trick is then to correctly decide the replacing production for a given nonterminal. A grammar is $LL(k)$ if we can do this by just looking at the next $k$ input symbols, and other techniques are known that are more powerful.

Now consider the following grammar:

1: $A \to A a$
2: $A \to \varepsilon$

If an $LL$ parser tries to replace $A$ by a production, it has to decide between production 1 and 2.

Let's consider what the proper course of action would be if our parser was omniscient. Every time it replaces the $A$ by production 1, it 'adds' an $a$ to what it expects for the remaining input (the expected remainder goes from $A$ to $Aa$ to $Aaa$...), but the $A$ at the start does not go away. Eventually, it must pick production 2, after which the $A$ disappears and it can never again add $a$s to the expectation.

As there is no chance to match a few more input symbols, the parser must decide at exactly that input position how many times production 1 must be matched. This means it must know exactly how many times in our case the $a$ will appear in the remainder of the input at this moment.

However, $LL(k)$ can see only $k$ symbols ahead. This means that if production 1 must be chosen more than $k$ times, the parser cannot 'see' this and so is doomed to fail. $LL(*)$ is better at parsing than $LL(k)$, because it can see arbitrarily far ahead in the input, but the crucial detail (which is not always mentioned) is that this lookahead is regular.

To imagine what happens, you can view the algorithm as follows: when it has to decide which production to take, it starts up a finite state machine (a DFA, which is equivalent in power to regular expressions) and lets this machine look at the rest of the input. This machine can then report 'use this production'. However, this machine is severely limited in what it can do. Although it is strictly better than looking at only the next $k$ symbols, it cannot for instance 'count', which means that it cannot help in the above situation.

Even if you were to 'hack' in some counting function in this finite automaton, then there are still left-recursive grammars for which you really need more power. For instance, for this grammar:

$A \to A B$
$A \to \varepsilon$
$B \to ( B )$
$B \to \varepsilon$

you would have to match 'towers' of matching braces, which is something a finite automaton cannot do. Worse still:

$A \to B C A D E$
$A \to A'$
$A' \to A' D E$
$A' \to \varepsilon$
$B \to a B a \mid b B b \mid a a \mid bb$
$C \to c C c \mid d C d \mid c c \mid d d$
$D \to e D e \mid f D f \mid e e \mid f f$
$E \to g E g \mid h E h \mid g g \mid h h$

is a totally awful grammar, for which I'm pretty sure no known linear time parsing algorithm works and all known general parsing algorithms take quadratic time. Worse still, any grammar describing this language is necessarily left-recursive. The grammar is still unambiguous however. You need a hand-crafted parser to parse these monsters in linear time.

Licensed under: CC-BY-SA with attribution

Not affiliated with cs.stackexchange