My grammar stops on an embedded '/' . I tried adding it as an allowed value but it isn't working

StackOverflow https://stackoverflow.com/questions/21768029

  •  11-10-2022
  •  | 
  •  

Domanda

using python 2.7 and pyparsing 2.0.1

I tried adding a slash as an allowed character to my pyparsing grammar but it isn't picking it up. Instead it stops parsing at that point. In the past I've been able to get around this sort of thing by adding punctuation characters to various buffers and adding them to the grammar but this time its not working so it appears to be something a little more complex. my guess its expecting because of my ill formed definition of the grammar, a keyword instead of more free text. in my grammar I expect keyword: then freetext which can span multiple lines terminating with another keyword: Here is the source code that shows the example: from pyparsing import * from string import whitespace

def test(phrase):
        """
        try to grab a  "keyword: " and free text following the keyword
        """

        print 'Phrase \n         1         2         3\n'
        print '123456789012345678901234567890\n'
        print '%s\n' % phrase
        kw = Combine(Word(alphas + nums) + Literal(':'))('KEY')
        punc = "".join([printables.replace(':', ''), ')', '[', ']', '(', ')',
                        '/', '.'])
        # but punc now has '/' in it twice

        kw.setDebug(True)
        body1 = originalTextFor(OneOrMore(~kw + (Word(alphas + nums) | punc)))('BODY1')
        body2 = originalTextFor(OneOrMore(~kw + (Word(punc + alphas + nums) | punc)))('BODY2')
        body3 = originalTextFor(OneOrMore(~kw + (Word(whitespace) | punc)))('BODY3')
        body1.setDebug(True)
        body2.setDebug(True)
        body3.setDebug(True)
        grammar = OneOrMore(Group(kw + body1) | Group(kw + body2) | Group(kw + body3))

        print ("grammar %s" % grammar)
        output = grammar.parseString(phrase)
        print ("Test %d output %s" % (test, output))
        for res in output:
            print res.dump()


if __name__ == '__main__':

    phrase = """


COTTON: (RAW) NEED HARVEST DATE.


SALAMI: (COOKED) SOUTHERN VARIES; SUGGEST ALT.

PEPPER:  ON TREE/ROOTS UNDERGROUND REQUEST PERMISSION
TO DIG PLANT AND RELOCATE.
"""
    # when I run the output stops at '/' in the 'pepper' parsing.
    test(phrase)

so yeah, parsing stops when it hits '/' in my input text

If I add the following after the parseString call:

    result, start, end = next(grammar.scanString(phrase))
    print len(phrase), end
    print 'NOTICE:'
    print phrase[end:end+10]

I get the following output as confirmation:

Exception raised:Expected W:(abcd...) (at char 102), (line:9, col:17)
167 102
NOTICE:
/ROOTS UND

Which is where I thought it was stopping at. The '/' character. I've tried adding alternative rules for the '/' by adding it to the allowed punctuation but so far have not succeeded. I think what makes this one more tricky is has no whitespace around it.

Any ideas?

È stato utile?

Soluzione

After a lot of tweaking around I can make it work but not sure why it has to be done this way:

from pyparsing import *
from string import whitespace

def test(phrase):
        """
        @summary: try to grab a keyword+ "keyword+ " and free text following
        the keyword
        @param phrase: a phrase of text to parse
        @type phrase: str
        @date: 20140213
        """
        test = 1
        print 'Phrase \n         1         2         3\n'
        print '123456789012345678901234567890\n'
        print '%s\n' % phrase
        kw = Combine(Word(alphas + nums) + Literal(':'))('KEY')
        punc = printables.replace(':', '')
        p2 = oneOf(")[]()/.")
        # but punc now has '/' in it twice
        p3 = punc | p2
        kw.setDebug(True)
        s = OneOrMore(p3)
        # pepper handles body1
        # http://structure.usc.edu/pyparsing/pyparsing.Word-class.html
        #      init char, body chars
        body1 = originalTextFor(OneOrMore(~kw + OneOrMore(Word(alphas +
                                            '/.' + nums))))('BODY1')
        body2 = originalTextFor(OneOrMore(~kw + (Word(alphas + punc + nums))))(
            'BODY2')
        body1.setDebug(True)
        body1.setName('BODY1')
        body2.setDebug(True)
        body2.setName('BODY2')
        grammar = OneOrMore(Group(kw + body1) | Group(kw + body2))

        print '============= %s =================' % test
        # this grabs only the first one
        print ("grammar %s" % grammar)
        output = grammar.parseString(phrase)
        print 'XXXXXXXXXXXXX'
        print 'XXXXXXXXXXXXX'
        print 'XXXXXXXXXXXXX'
        result, start, end = next(grammar.scanString(phrase))
        print len(phrase), end
        print 'NOTICE:'
        print phrase[end:end+10]

        print ("Test %d output %s" % (test, output))
        for res in output:
            print res.dump()


if __name__ == '__main__':

    phrase = """


COTTON2: (RAW) NEED HARVEST DATE.


SALAMI2: (COOKED) SOUTHERN VARIES; SUGGEST ALT.

PEPPER1:  ON TREE/ROOTS UNDERGROUND REQUEST PERMISSION
TO DIG PLANT AND RELOCATE.
"""
    # when I run the output stops at '/' in the 'pepper' parsing.
    test(phrase)

I relabeled the cotton, salami, pepper with numbers to show which rule they trigger. for example Cotton seems to trigger Body2 and Salami seems to trigger body2 and Pepper triggers body1. I don't really like this solution as it is a bit hardcoded. And it doesn't make sense to me.

When I run it I get the following output: (it doesn't seem to like the EOF condition)

Exception raised:Expected W:(abcd...) (at char 170), (line:11, col:1)
170 169
NOTICE:


Test 1 output [['COTTON2:', '(RAW) NEED HARVEST DATE.'], ['SALAMI2:', '(COOKED) SOUTHERN VARIES; SUGGEST ALT.'], ['PEPPER1:', 'ON TREE/ROOTS UNDERGROUND REQUEST PERMISSION\nTO DIG PLANT AND RELOCATE.']]
['COTTON2:', '(RAW) NEED HARVEST DATE.']
- BODY2: (RAW) NEED HARVEST DATE.
- KEY: COTTON2:
['SALAMI2:', '(COOKED) SOUTHERN VARIES; SUGGEST ALT.']
- BODY2: (COOKED) SOUTHERN VARIES; SUGGEST ALT.
- KEY: SALAMI2:
['PEPPER1:', 'ON TREE/ROOTS UNDERGROUND REQUEST PERMISSION\nTO DIG PLANT AND RELOCATE.']
- BODY1: ON TREE/ROOTS UNDERGROUND REQUEST PERMISSION
TO DIG PLANT AND RELOCATE.
- KEY: PEPPER1:

Process finished with exit code 0

But it does process all of the input, including the embedded '/' that it was bombing on earlier.

so there is still a bit of a question as to what is going on with the lookahead rule and embedded 'freetext rule' on body1 and body2.

    body1 = originalTextFor(OneOrMore(~kw + OneOrMore(Word(alphas +
                                        '/.' + nums))))('BODY1')
    body2 = originalTextFor(OneOrMore(~kw + (Word(alphas + punc + nums))))(
        'BODY2')

On a whim I just threw in the '/.' into body1, since PEPPER seemed to be being handled by body1. I tried throwing punc into body1 but that didn't work. -- it actually made things worse.

however the above solution (source code function in full) works.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top