What is the role of the Empty production for PEGs?

https://stackoverflow.com/questions/5879340

28-10-2019
|

Question

The empty production rule

nonterminal -> epsilon

is useful in lex-yacc LR bottom up parser generators (e.g. PLY).

In what context should one use Empty productions in PEG parsers e.g. pyparsing ?

Solution

BNF's often use empty as an alternative, effectively making the overall expression optional:

leading_sign ::= + | - | empty
integer ::= leading_sign digit...

This is unnecessary in pyparsing, since pyparsing includes the Optional class for this:

# no empty required
leading_sign = Optional(oneOf("+ -"))
integer = leading_sign + Word(nums)

Empty does come in handy for some pyparsing-specific purposes though:

Skips over whitespace - some elements in pyparsing do not skip over whitespace before starting their parse, such as CharsNotIn and restOfLine. If you had a simple input of key-value entries, in which the key was a quoted string and the value was everything after the quoted string, like this:

"Key 1" value of Key 1
"Key 2" value of Key 2

Defining this as:

quotedString + restOfLine

would give you " value of Key 1" and " value of Key 2" as the values. Pyparsing's empty does skip over whitespace, so changing the grammar to:

quotedString + empty + restOfLine

will give you values without the leading spaces.

Activating parse actions at specific places - I used empty's as part of the generated expression in originalTextFor to drop in start and end location markers. The parse actions for the empty's replace them with their location values, then the parse action for originalTextFor uses those locations to slice the original text from the input string.

Be careful using empty. empty always matches, but never advances the parse location (except for skipping whitespace). So:

OneOrMore(empty)

will be an infinite loop.

empty | "A" | "B" | "C"

will never match any of the non-empty alternatives, since MatchFirsts short-circuit.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow