pyparsing string of quoted names

Question 1

I'm assuming you actually wanted 2 strings, not 3 (judging by the quotation marks).

To use pyparsing you start by thinking through and writing down what the format is that you want to parse (this is actually a good first step no matter what parsing library or tools you will be using). It can be as rigorous as you want to be, but let's start with simple/high-level for this problem. I'll use a quasi-BNF form, where '*' means "0 or more repetition":

list_of_names = quoted_string (',' quoted_string)*

"A list of names is a quoted string, followed by 0 or more comma and quoted string pairs."

Pyparsing's classes use names that, while perhaps a little verbose for coding, fairly accurately follow that same form.

list_of_names = quotedString + ZeroOrMore(',' + quotedString)

Pyparsing also includes some common expressions, and a quotedString is one of them.

Now that we have defined list_of_names, we can use it to parse your input:

s = "'Mark, Bob','John'"
print list_of_names.parseString(s)

And we get:

["'Mark, Bob'", ',', "'John'"]

Well, that's ugly. For one thing, we just want the names, not any separating commas. So change list_of_names to:

list_of_names = quotedString + ZeroOrMore(Suppress(',') + quotedString)

And now it's cleaned up a bit:

["'Mark, Bob'", "'John'"]

You weren't clear on whether you wanted to keep the quotation marks or not. Usually when I work with strings, I just want the string content, and not have the string include the quotes. You could certainly write this:

for name in list_of_names.parseString(s):
    print name.strip("'")

But there may be lots of things you want to do with this parsed output, and you don't want to have to hassle with stripping off the quotes every time you do something.

So instead, you can define a parse action, a callback to be run at parse time which will clean up those quotes. Pyparsing includes one called removeQuotes, and you include it in your parser like this:

quotedString.setParseAction(removeQuotes)

Now if we parse your input again, we get a pretty clean-looking list:

['Mark, Bob', 'John']

Lastly, this business of parsing lists of the form something + ZeroOrMore(Suppress(delimiter) + something) happens a lot, especially when the delimiter is a comma. So pyparsing includes a helper method called delimitedList that emits the same thing. Your whole parser now looks like:

quotedString.setParseAction(removeQuotes)
list_of_names = delimitedList(quotedString)

And you extract the data by calling the parseString method on the list_of_names expression.

Question 2

#!/usr/bin/python

from pyparsing import *


s = "'Mark, Bob','John'"

fnames = OneOrMore(Suppress(Literal("\'")) | Suppress(Literal("\"")) | Suppress(",") | Word(alphas))

for n in fnames.parseString(s):
    print n

When run outputs just the names:

Mark
Bob
John