Is there a well-defined and well-reasoned transformation from LL(*) to PEG?

https://stackoverflow.com/questions/13127605

15-07-2021
|

Question

I am in the process of investigating PEG (Parsing Expression Grammar) parsers, and one of the topics I'm looking into is equivalence with other parsing techniques.

I found a good paper about transforming regexes into equivalent PEGs at From Regular Expressions to Parsing Expression Grammars.

I am hoping to find a similar treatment for LL(*) parsers but have as-yet come up empty-handed. It seems to me that a lot of the techniques described in 1 are also going to be applicable to the problem of LL(*) transformation, however I'm not sufficiently steeped in the formalisms to be confident of my own analysis.

Your collective help would be much appreciated!

Solution

The Wikipedia article about PEG says it all, I think. PEG does recursive descent by using clause ordering for disambiguation. In theory, the family of languages that can be parsed with recursive descent is the LL family, but, because PEG has unlimited lookahead and no ambiguity, the family should be a larger one, probably full CFG.

Every LL(k) grammar can be implemented by a recursive-descent parser with k lookahead, therefore every LL(k) grammar can be transformed to a PEG grammar by ordering the rules so those that require the longest lookahed are listed first.

This is an LL(k) grammar:

params = expr
params = expr ',' params

To make it a PEG grammar for the same language, the rules must be reordered:

params = expr ',' params
params = expr

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow