Removing Left Recursion from Context-Free Grammars - Ordering of nonterminals

https://cs.stackexchange.com/questions/2792

16-10-2019
|

Question

I have recently implemented the Paull's algorithm for removing left-recursion from context-free grammars:

Assign an ordering $A_1, \dots, A_n$ to the nonterminals of the grammar.

for $i := 1$ to $n$ do begin
$\quad$ for $j:=1$ to $i-1$ do begin
$\quad\quad$ for each production of the form $A_i \to A_j\alpha$ do begin
$\quad\quad\quad$ remove $A_i \to A_j\alpha$ from the grammar
$\quad\quad\quad$ for each production of the form $A_j \to \beta$ do begin
$\quad\quad\quad\quad$ add $A_i \to \beta\alpha$ to the grammar
$\quad\quad\quad$ end
$\quad\quad$ end
$\quad$ end
$\quad$ transform the $A_i$-productions to eliminate direct left recursion
end

According to this document, the efficiency of the algorithm crucially depends on the ordering of the nonterminals chosen in the beginning; the paper discusses this issue in detail and suggest optimisations.

Some notation:

We will say that a symbol $X$ is a direct left corner of a nonterminal $A$, if there is an $A$-production with $X$ as the left-most symbol on the right-hand side. We define the left-corner relation to be the reflexive transitive closure of the direct-left-corner relation, and we define the proper-left-corner relation to be the transitive closure of the direct-left-corner relation. A nonterminal is left recursive if it is a proper left corner of itself; a nonterminal is directly left recursive if it is a direct left corner of itself; and a nonterminal is indirectly left recursive if it is left recursive, but not directly left recursive.

Here is what the authors propose:

In the inner loop of Paull’s algorithm, for nonterminals $A_i$ and $A_j$, such that $i > j$ and $A_j$ is a direct left corner of $A_i$, we replace all occurrences of $A_j$ as a direct left corner of $A_i$ with all possible expansions of $A_j$.

This only contributes to elimination of left recursion from the grammar if $A_i$ is a left-recursive nonterminal, and $A_j$ lies on a path that makes $A_i$ left recursive; that is, if $A_i$ is a left corner of $A_j$ (in addition to $A_j$ being a left corner of $A_i$).

We could eliminate replacements that are useless in removing left recursion if we could order the nonterminals of the grammar so that, if $i > j$ and $A_j$ is a direct left corner of $A_i$, then $A_i$ is also a left corner of $A_j$.

We can achieve this by ordering the nonterminals in decreasing order of the number of distinct left corners they have.

Since the left-corner relation is transitive, if C is a direct left corner of B, every left corner of C is also a left corner of B.

In addition, since we defined the left-corner relation to be reflexive, B is a left corner of itself.

Hence, if C is a direct left corner of B, it must follow B in decreasing order of number of distinct left corners, unless B is a left corner of C.

All I want is to know how to order the nonterminals in the beginning, but I don't get it from the paper. Can someone explain it in a simpler way? Pseudocode would help me to understand it better.

Solution

This is actually not very complicated. I'll assume that epsilon productions have already been eliminated from the language, because that will only obscure the key concept of "left corner".

Form a graph G where the vertices are all the non-terminals of the grammar. Now draw a directed edge from A to B if there is any production rule that looks like "A -> B[...]". The paper calls B a "direct left corner" of A. More generally, some other non-terminal C is called a "left corner" of A if there is some path from A to C along the edges of this graph. This can be done by computing the transitive closure of G, call it H.

The paper suggests ordering the vertices by counting the number of left corners each vertex A has (i.e. how many other non-terminals you can reach from A, or the out-degree of A in the graph H), then sorting them in decreasing order by this number.

One handwavy motivation for this policy is that if there is an important non-terminal (say the starting symbol S) with connections to many other symbols), then it makes sense to purge left-recursion from S early on, because if you leave it till later there will be more copies of S that you need to expand out. I think the explanation in the paper is more convincing, but perhaps less obvious.

Licensed under: CC-BY-SA with attribution

Not affiliated with cs.stackexchange