How to validate dynamically defined grammar elements in PyParsing

https://stackoverflow.com/questions/17946734

04-06-2022
|

Domanda

I am implementing a parser for a fairly complex grammar using PyParsing. (Which, if I may add, is really a pleasure to use!)

The grammar is somewhat 'dynamic' in that allows for the definition of (various) alphabets, which in turn define the elements allowed in other definitions. As an example:

alphabet: a b c
lists:
s1 = a b
s2 = b c x

Here, alphabet is meant to define what elements are allowed in the lists definitions. E.g., s1 would be valid, but s2 contains an invalid x.

A simple PyParsing parser without that kind of validation could look like this:

from pyparsing import Literal, lineEnd, Word, alphanums,\
    OneOrMore, Group, Suppress, dictOf

def fixedToken(literal):
    return Suppress(Literal(literal))

Element = Word(alphanums)

Alphabet = Group(OneOrMore(~lineEnd + Element))
AlphaDef = fixedToken("alphabet:") + Alphabet

ListLine = OneOrMore(~lineEnd + Element)
Lists = dictOf(Word(alphanums) + fixedToken("="), ListLine)

Start = AlphaDef + fixedToken("lists:") + Lists

if __name__ == "__main__":

    data = """
    alphabet: a b c
    lists:
    s1 = a b
    s2 = b c x
    """

    res = Start.parseString(data)
    for k, v in sorted(res.items()):
        print k, "=", v

This will parse & give the output:

Alphabet= set(['a', 'c', 'b'])
s1 = ['a', 'b']
s2 = ['b', 'c', 'x']

However, I would like the parser to raise a ParseException (or similar) for s2, since it contains the invalid x. Ideally, I would like to be able to make the definition of ListLine to say something like: OneOrMore(oneOf(Alphabet)) - but evidently, that would require some dynamic interpretation which can only be done once Alphabet has actually been parsed & assembled.

One solution I found was to add parse actions to 1. remember the alphabet and 2. validate the lines:

# ...
Alphabet = Group(OneOrMore(~lineEnd + Element))
def alphaHold(toks):
    alphaHold.alpha = set(*toks)
    print "Alphabet=", alphaHold.alpha
Alphabet.addParseAction(alphaHold)

AlphaDef = fixedToken("alphabet:") + Alphabet

ListLine = OneOrMore(~lineEnd + Element)
def lineValidate(toks):
    unknown = set(toks).difference(alphaHold.alpha)
    if len(unknown):
        msg= "Unknown element(s): {}".format(unknown)
        print msg
        raise ParseException(msg)
ListLine.addParseAction(lineValidate)
# ...

This gives almost the desired output:

Alphabet= set(['a', 'c', 'b'])
Unknown element(s): set(['x'])
s1 = ['a', 'b']

But unfortunately, PyParsing catches Exceptions thrown from parse actions, so this approach fails on a technicality. Is there another way to achieve this within PyParsing which I might have missed?

Soluzione

You are already pretty close to having this working. There are a number of cases where a pyparsing parser dynamically adjusts itself based on text that was previously parsed. The trick is to use a Forward placeholder expression, and then insert the desired values into the placeholder as part of a parse action (very close to what you have in place now). Like this:

Element = Forward()

Alphabet = OneOrMore(~lineEnd + oneOf(list(alphas)))
def alphaHold(toks):
    Element << oneOf(toks.asList())
Alphabet.setParseAction(alphaHold)

From here, I think the rest of your code works fairly well as-is. Actually, you won't even need the line validating function, as pyparsing will only match valid element names as elements using this method.

You might find that pyparsing's error reporting is a little fuzzy. You can get things to be a little better using '-' instead of '+' in some judicious places. Since pyparsing uses ParseExceptions for all of its internal signalling of expression matches/mismatches, it does not automatically recognize when you have gotten into a defined expression, but then have an invalid match on a contained expression. You can tell pyparsing to detect this using the '-' operator, like this:

ListDef = listName + '=' - OneOrMore(~lineEnd + Element)

Once pyparsing gets a name and an '=' sign, then any invalid Element found will immediately raise a ParseSyntaxException, which will stop pyparsing's scan of the text at that point, and report the exception at the location of the invalid element.

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow