regex in python :groups and |

https://stackoverflow.com/questions/7182516

11-01-2021
|

Question

I can't find how to proceed for a regular expression, here is an example:

string = "red\\/banana 36    monkey\\/apple 14   red\\/apple 23  red\\/horse 56  bull\\/red 67  monkey\\/red 45    bull\\/shark 89"

I want to do a single regex with re.match.group() which will take into account only the ones like red/xxxx and the ones like xxxx/red and group the xxxx names only, not couples:

I want to do:

print(match.group("beginningwithred") + " " + match.group("number")

and obtain:

banana 36
apple 23
horse 56

then do:

print(match.group("endingwithred") + " " + match.group("number")

and obtain:

bull 67
monkey 45

my current code goes like:

iterator = regex.finditer(string)
for match in iterator:
    regex = re.compile('red\\\\\\\\/(?P<beginningwithred>banana|apple|horse)|(?P<endingwithred>bull|monkey)\\\\\\\\/red (?P<number>\d\d)')

but it doesn't work, I can't use | between groups and python HOWTO doesn't help.. I tried with { } too including the whole two expressions but it doesn't work either. It must not be really complicated but I can't find out what's wrong.

Solution

i don't completely follow, but it sounds like you want non-capturing groups around your alternatives:

(?:foo|bar|baz)

that lets you use | without creating a "real" group.

update why doesn't this help? is this not right?

>>> s="red\\/banana 36    monkey\\/apple 14   red\\/apple 23  red\\/horse 56  bull\\/red 67  monkey\\/red 45    bull\\/shark 89"
>>> r = re.compile(r'(?:red\\/(?P<begin>\w+)|(?P<end>\w+)\\/red)\s+(?P<number>\d+)')
>>> for m in r.finditer(s):
...     print(m.groups())

('banana', None, '36')
('apple', None, '23')
('horse', None, '56')
(None, 'bull', '67')
(None, 'monkey', '45')

update2

if you just want to print out the non-None values you can do something like:

  >>> for m in r.finditer(s):
  ...     print(','.join(g for g in m.groups() if g is not None))

OTHER TIPS

I'm sure it's impossible to find an extra_terrestial_regex matching all the occurences, those with 'red' in first position and those with 'red' in second position, but being so that:

for mat in extra_terrestial_regex.finditer(s):
    print mat.group("beginningwithred") + " " + match.group("number")

will select only the matches with 'red' in first position and will skip the others.

It isn't a regex than can obtain such a result, it's only a function; do the following one perform what you want ?

import re

s = ('red\\/banana 36    monkey\\/apple 14  '
     'red\\/apple 23  red\\/horse 56  bull\\/red 67 '
     'monkey\\/red 45    bull\\/shark 89')


def gen(s,what,word):
    if what=='beginning':
        regx = re.compile(r'%s\\/([^ ]+) (\d+)' % word)
    elif what=='ending':
        regx = re.compile(r'([^ ]+)\\/%s (\d+)' % word)
    else:
        regx = re.compile('(\A).*(\Z)')
    for mat in regx.finditer(s):
        yield mat.groups()


print '\n'.join('%s %s' % x for x in gen(s,'beginning','red'))
print '----------------'
print '\n'.join('%s %s' % x for x in gen(s,'ending','red'))
print '----------------'
print '\n'.join('%s %s' % x for x in gen(s,'ZOU','red'))
print '----------------'
print '\n'.join('%s %s' % x for x in gen(s,'ending','apple'))

result

banana 36
apple 23
horse 56
----------------
bull 67
monkey 45
----------------

----------------
monkey 14
red 23

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow