Question

string = "input-ports 6012, 6017, 6016"
m = re.match("input-ports(\s\d{4},?)(\s\d{4},?)(\s\d{4},?)", string)
print m.groups #=> (' 6012,', ' 6017,', ' 6016')

But when I want to use group repetition, it only return the last number

m = re.match("input-ports(\s\d{4},?)+", string)
print m.groups #=> (' 6016',)

anyone can tell me why is it?

Was it helpful?

Solution

While traditional regex engines remember and return only the last match, some advanced libs provide captures property which holds all matches for the given group. There's a library called regex for python that does that, among other nice things:

import regex

string = "input-ports 6012, 6017, 6016"
m = regex.match("input-ports(?:\s(\d{4}),?)+", string)
print m.captures(1) # ['6012', '6017', '6016']

If you can't use this library, the only workaround is to use findall and replace the repetition with a single group with lookaheads. This is not always possible, but your example is easy:

import re

string = "input-ports 6012, 6017, 6016"
m = re.findall("(?<=\s)\d{4}(?=,|$)", string)
print m # ['6012', '6017', '6016'] 

OTHER TIPS

Note: A repeated capturing group will only capture the last iteration. Put a capturing group around the repeated group to capture all iterations or use a non-capturing group instead if you're not interested in the data

on regex101

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top