regex in python :groups and |
-
11-01-2021 - |
Question
I can't find how to proceed for a regular expression, here is an example:
string = "red\\/banana 36 monkey\\/apple 14 red\\/apple 23 red\\/horse 56 bull\\/red 67 monkey\\/red 45 bull\\/shark 89"
I want to do a single regex with re.match.group() which will take into account only the ones like red/xxxx and the ones like xxxx/red and group the xxxx names only, not couples:
I want to do:
print(match.group("beginningwithred") + " " + match.group("number")
and obtain:
banana 36
apple 23
horse 56
then do:
print(match.group("endingwithred") + " " + match.group("number")
and obtain:
bull 67
monkey 45
my current code goes like:
iterator = regex.finditer(string)
for match in iterator:
regex = re.compile('red\\\\\\\\/(?P<beginningwithred>banana|apple|horse)|(?P<endingwithred>bull|monkey)\\\\\\\\/red (?P<number>\d\d)')
but it doesn't work, I can't use | between groups and python HOWTO doesn't help.. I tried with { } too including the whole two expressions but it doesn't work either. It must not be really complicated but I can't find out what's wrong.
Solution
i don't completely follow, but it sounds like you want non-capturing groups around your alternatives:
(?:foo|bar|baz)
that lets you use |
without creating a "real" group.
update why doesn't this help? is this not right?
>>> s="red\\/banana 36 monkey\\/apple 14 red\\/apple 23 red\\/horse 56 bull\\/red 67 monkey\\/red 45 bull\\/shark 89"
>>> r = re.compile(r'(?:red\\/(?P<begin>\w+)|(?P<end>\w+)\\/red)\s+(?P<number>\d+)')
>>> for m in r.finditer(s):
... print(m.groups())
('banana', None, '36')
('apple', None, '23')
('horse', None, '56')
(None, 'bull', '67')
(None, 'monkey', '45')
update2
if you just want to print out the non-None
values you can do something like:
>>> for m in r.finditer(s):
... print(','.join(g for g in m.groups() if g is not None))
OTHER TIPS
I'm sure it's impossible to find an extra_terrestial_regex matching all the occurences, those with 'red' in first position and those with 'red' in second position, but being so that:
for mat in extra_terrestial_regex.finditer(s):
print mat.group("beginningwithred") + " " + match.group("number")
will select only the matches with 'red' in first position and will skip the others.
.
It isn't a regex than can obtain such a result, it's only a function; do the following one perform what you want ?
import re
s = ('red\\/banana 36 monkey\\/apple 14 '
'red\\/apple 23 red\\/horse 56 bull\\/red 67 '
'monkey\\/red 45 bull\\/shark 89')
def gen(s,what,word):
if what=='beginning':
regx = re.compile(r'%s\\/([^ ]+) (\d+)' % word)
elif what=='ending':
regx = re.compile(r'([^ ]+)\\/%s (\d+)' % word)
else:
regx = re.compile('(\A).*(\Z)')
for mat in regx.finditer(s):
yield mat.groups()
print '\n'.join('%s %s' % x for x in gen(s,'beginning','red'))
print '----------------'
print '\n'.join('%s %s' % x for x in gen(s,'ending','red'))
print '----------------'
print '\n'.join('%s %s' % x for x in gen(s,'ZOU','red'))
print '----------------'
print '\n'.join('%s %s' % x for x in gen(s,'ending','apple'))
result
banana 36
apple 23
horse 56
----------------
bull 67
monkey 45
----------------
----------------
monkey 14
red 23