pyparsing ParseException: Expected end of line -- general questions

Question

pyparsing takes a different view toward parsing than lex/yacc does. You have to let the classes do some of the work. Here's an example in your code:

    self.columnHeader = OneOrMore(self.aucc) \
                        | OneOrMore(nums) \
                        | OneOrMore(self.blankCharacter) \
                        | OneOrMore(self.specialCharacter)

You are equating OneOrMore with the '+' character of a regex. In pyparsing, this is true for ParseElements, but at the character level, pyparsing uses the Word class:

    self.columnHeader = Word(self.aucc + nums + self.blankCharacter + self.specialCharacter)

OneOrMore works with ParseElements, not characters. Look at:

    OneOrMore(nums)

nums is the string "0123456789", so OneOrMore(nums) will match "0123456789", "01234567890123456789", etc., but not "123". That is what Word is for. OneOrMore will accept a string argument, but will implicitly convert it to a Literal.

This is a fundamental difference between using pyparsing and lex/yacc, and I think is the source of much of the complexity in your code.

Some other suggestions:

Your code has some premature optimizations in it - you write:

aucc = ''.join(set([alphas.upper(),"'"]))

Assuming that this will be used for defining Words, just do:

aucc = alphas.upper() + "'"

There is no harm in having duplicate characters in aucc, Word will convert this to a set internally.

Write a BNF for what you want to parse. It does not have to be overly rigorous as you would with lex/yacc. From your samples, it looks something like:

# sample
ZOO/STATUS/FOOD ALLOCATION//
TOPIC/BIRD FEED IS RUNNING LOW//
FREE/WE HAVE DISCOVERED MOTHS INFESTED THE BIRDSEED AND IT IS NO
LONGER USABLE.//

parser :: header topicEntry+
header :: "ZOO" sep namedValue
namedValue :: uppercaseWord sep valueBody
valueBody :: (everything up to //)
topicEntry :: topicHeader topicBody
topicHeader :: "TOPIC" sep valuebody
topicBody :: freeText
freeText :: "FREE" sep valuebody
sep :: "/"

Converting to pyparsing, this looks something like:

SEP = Literal("/")
BODY_TERMINATOR = Literal("//")
FREE_,TOPIC_,ZOO_ = map(Keyword,"FREE TOPIC ZOO".split())
uppercaseWord = Word(alphas.upper())
valueBody = SkipTo(BODY_TERMINATOR) # adjust later, but okay for now...

freeText = FREE_ + SEP + valueBody

topicBody = freeText
topicHeader = TOPIC_ + SEP + valueBody
topicEntry = topicHeader + topicBody

namedValue = uppercaseWord + SEP + valueBody
zooHeader = ZOO_ + SEP + namedValue

parser = zooHeader + OneOrMore(topicEntry)

(valueBody will have to get more elaborate when you add support for '://' embedded within a value, but save that for Round 2.)

Don't make things super complicated until you get at least some simple stuff working.