Domanda

I am trying to write a function that reads a website config file in the following format:

routename, optional_path, Title With Spaces;
otherroute, other title {
    nestedroute, :can_have_semicolon, Nested Title;
};

differentroute, Awesome Title Ya'll;

(I'm writing an ember app)

So I've written a recursive formula that in all respects is working fine, but the regex to split the actual individual items has caused me all kinds of headaches.

First the whole string given to the recursive function is split with this:

filter(None, re.split(r";\n*(\b|$)", routes))

Which cuts the highest order groups apart (the above example would end up in two pieces, one with brackets and one without, but both without their trailing colons).

If the individual item is found to contain { or } it uses this regex:

route, path, title, bundle = re.match(r"\s*(\w+)\s*,\s*(\S*?)\s*?,?\s*([\w ]+)\s*{\s*(.+)\s*}\s*", r, flags = re.S).groups()

and the bundle is recursively handled after having it's tabs "unindented."

Else this regex is used:

route, path, title = re.match(r"\s*(\w+)\s*,\s*(\S*?)\s*?,?\s*([\w ]+)\s*", r).groups()

Here they both are more clearly written with the whitespace codes removed:

re.match(r"(\w+),(\S*?),?([\w ]+){(.+)}", r, flags = re.S).groups()
re.match(r"(\w+),(\S*?),?([\w ]+)", r).groups()

I have been getting all kinds of crazy behavior. Basically, the path is optional, but the title and route are mandatory, and the title needs to be able to include spaces. The stuff in the brackets is grabbed indiscriminately, since it will be recursively handled by these same expressions.

I've used many variants of these, some that don't pick up the path at all, some that can't function without it, some that don't support : in the path, some that split the Title or randomly drop letters or words from it. There have been too many variants with too many inputs to remember. As it stands the second regex (I'm focusing on the simple version and adding the bracket behavior once it works, since the bracket behavior hasn't seemed to give any problems) with the following input:

customer, path, Customer Account

is giving the following output:

('customer', '', 'path')

but without the path:

customer, Customer Account

it's doing what I want it to.

('customer', '', 'Customer Account')

If you need more info, please ask. This is already so long.

È stato utile?

Soluzione

Your regexp makes the comma optional, but not the word before it. You need to make the combination optional:

(\w+),(?:(\S*?),)?([\w ]+){(.+)}
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top