Вопрос

I have a string that can contain something like this:

s = "'Mark, Bob','John'"

What is the best way to parse this into 3 strings? I am really new to pyparsing, and I am afraid that I don't understand it too well

(edit) I am sorry, I was not very clear. This is part of a program that is using grammar and pyparsing to parse a file. This is a small part of the input that I am not sure what to do with. This should really represent an array of three names, that is what I would like to get out of it.

Thanks

Это было полезно?

Решение

I'm assuming you actually wanted 2 strings, not 3 (judging by the quotation marks).

To use pyparsing you start by thinking through and writing down what the format is that you want to parse (this is actually a good first step no matter what parsing library or tools you will be using). It can be as rigorous as you want to be, but let's start with simple/high-level for this problem. I'll use a quasi-BNF form, where '*' means "0 or more repetition":

list_of_names = quoted_string (',' quoted_string)*

"A list of names is a quoted string, followed by 0 or more comma and quoted string pairs."

Pyparsing's classes use names that, while perhaps a little verbose for coding, fairly accurately follow that same form.

list_of_names = quotedString + ZeroOrMore(',' + quotedString)

Pyparsing also includes some common expressions, and a quotedString is one of them.

Now that we have defined list_of_names, we can use it to parse your input:

s = "'Mark, Bob','John'"
print list_of_names.parseString(s)

And we get:

["'Mark, Bob'", ',', "'John'"]

Well, that's ugly. For one thing, we just want the names, not any separating commas. So change list_of_names to:

list_of_names = quotedString + ZeroOrMore(Suppress(',') + quotedString)

And now it's cleaned up a bit:

["'Mark, Bob'", "'John'"]

You weren't clear on whether you wanted to keep the quotation marks or not. Usually when I work with strings, I just want the string content, and not have the string include the quotes. You could certainly write this:

for name in list_of_names.parseString(s):
    print name.strip("'")

But there may be lots of things you want to do with this parsed output, and you don't want to have to hassle with stripping off the quotes every time you do something.

So instead, you can define a parse action, a callback to be run at parse time which will clean up those quotes. Pyparsing includes one called removeQuotes, and you include it in your parser like this:

quotedString.setParseAction(removeQuotes)

Now if we parse your input again, we get a pretty clean-looking list:

['Mark, Bob', 'John']

Lastly, this business of parsing lists of the form something + ZeroOrMore(Suppress(delimiter) + something) happens a lot, especially when the delimiter is a comma. So pyparsing includes a helper method called delimitedList that emits the same thing. Your whole parser now looks like:

quotedString.setParseAction(removeQuotes)
list_of_names = delimitedList(quotedString)

And you extract the data by calling the parseString method on the list_of_names expression.

Другие советы

#!/usr/bin/python

from pyparsing import *


s = "'Mark, Bob','John'"

fnames = OneOrMore(Suppress(Literal("\'")) | Suppress(Literal("\"")) | Suppress(",") | Word(alphas))

for n in fnames.parseString(s):
    print n

When run outputs just the names:

Mark
Bob
John
Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top