Question

I have a string and have to extract. The problem is I can't describe repetition within a repetition. So here is the code:

f = "Makimak-cg_mk_Mokarmi"
pattern = "([A-Za-z][A-Za-z0-9]+)((?:[-_]([a-z]{2}))+)"
mO = re.match(pattern, f)
print mO.groups()

And the result will be:

('Makimak', '-cg_mk', 'mk')

But I would like to get tuple like this:

('Makimak', '-cg_mk', 'cg', 'mk')

So there is a group "-cg_mk" which include a repetition of the two character pattern. But there is no thing like that:

[a-z]{2}+

The groups of the result give back only the last part of the repetition expressed here:

([a-z]{2})

My thought was that there should be a "+" too like this:

([a-z]{2})+

It gives the same result. The match object is generated, simply I can't get the groups that I want.

Was it helpful?

Solution

You may need to do this in two steps:

>>> f = "Makimak-cg_mk_Mokarmi"
>>> pattern = "([A-Za-z][A-Za-z0-9]+)((?:[-_][a-z]{2})+)"
>>> mO = re.match(pattern, f)
>>> print mO.groups() + tuple(re.split('[-_]', mO.group(2))[1:])
('Makimak', '-cg_mk', 'cg', 'mk')

This just captures the groups ('Makimak', '-cg_mk'), and then combines this with the result of splitting the second group on occurrences of - or _.

If you always knew the exact number of two character patterns you could accomplish this with a lookahead, but it doesn't seem like that is known up front or you wouldn't need the repetition.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top