Question

I'd like to parse a string with nested parenthesis with these conditions:

  • Elements are delimited by comma , or bar |.
  • Nested parenthesis elements might be a single alphanum or another nested parenthesis.
  • Each nested parenthesis element connected by bar | literal leads to creation of a new sequence combining previous sequence elements and forward elements connected by comma, outside that nested parenthesis.

In order to clarify, let me give some examples of input strings and the results they should return:

(a, b, c) should return: a, b, c

(a, (b | c)) should return: a, b and a, c

(a, b, (c | (d, e)), f) should return: a, b, c, f and a, b, d, e, f

(a, b, (c | (d, e) | f), g) should return: a, b, c, g and a, b, d, e, g and a, b, f, g

(a, b, c, ((d, (e | f)) | (g, h)), i) should return: a, b, c, d, e, i and a, b, c, d, f, i and a, b, c, g, h, i

((a | b), c) should return: a, c and b, c

Was it helpful?

Solution

(from the pyparsing wiki) You can get the string parsed using infixNotation (formerly known as operatorPrecedence). Assuming that ',' has precedence over '|', this would look like:

variable = oneOf(list(alphas.lower()))
expr = infixNotation(variable, 
            [
            (',', 2, opAssoc.LEFT),
            ('|', 2, opAssoc.LEFT),
            ])

Converting your test cases to a little testing framework, we can at least test the parsing part:

tests = [
    ("(a, b, c)", ["abc"]),
    ("(a, b | c)", ["ab", "c"]),
    ("((a, b) | c)", ["ab", "c"]),
    ("(a, (b | c))", ["ab", "ac"]),
    ("(a, b, (c | (d, e)), f)", ["abcf","abdef"]),
    ("(a, b, (c | (d, e) | f), g)", ["abcg", "abdeg", "abfg"]),
    ("(a, b, c, ((d, (e | f)) | (g, h)), i)",
      ["abcdei", "abcdfi", "abcghi"]),
    ("((a | b), c)", ["ac", "bc"]),
    ]

for test,expected in tests:
    # if your expected values *must* be lists and not strings, then
    # add this line
    # expected = [list(ex) for ex in expected]
    result = expr.parseString(test)
    print result[0].asList()

which will give you something like this:

['a', ',', 'b', ',', 'c']
[['a', ',', 'b'], '|', 'c']
[['a', ',', 'b'], '|', 'c']
['a', ',', ['b', '|', 'c']]
['a', ',', 'b', ',', ['c', '|', ['d', ',', 'e']], ',', 'f']
['a', ',', 'b', ',', ['c', '|', ['d', ',', 'e'], '|', 'f'], ',', 'g']
['a', ',', 'b', ',', 'c', ',', [['d', ',', ['e', '|', 'f']], '|', ['g', ',', 'h']], ',', 'i']
[['a', '|', 'b'], ',', 'c']

That is the easy part, parsing the string and getting the operator precedence reflected in the resulting structure. Now if you follow the example from the regex inverter, you will need to attach objects to each parsed bit, something like this:

class ParsedItem(object):
    def __init__(self, tokens):
        self.tokens = tokens[0]
class Var(ParsedItem): 
    """ TBD """
class BinaryOpn(ParsedItem):
    def __init__(self, tokens):
        self.tokens = tokens[0][::2]
class Sequence(BinaryOpn):
    """ TBD """
class Alternation(BinaryOpn):
    """ TBD """

variable = oneOf(list(alphas.lower())).setParseAction(Var)
expr = infixNotation(variable, 
            [
            (',', 2, opAssoc.LEFT, Sequence),
            ('|', 2, opAssoc.LEFT, Alternation),
            ])

Now you will have to implement the bodies of Var, Sequence, and Alternation. You will not get a list of values directly back from pyparsing, instead you will get one of these object types back. Then, instead of calling asList() as I did in the sample above, you'll call something like generate or makeGenerator to get a generator from that object. Then you'll invoke that generator to have the objects generate all the different results for you.

I leave the rest as an exercise for you.

-- Paul

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top