Question

I'm working on extracting the pattern def ([^\s]+)\([^\.]*\) in Python. However, when I have multiline input, only the first occurrence is obtained. I have specific the re.MULTILINE option on my Python regular expression but still to no avail. Lets say I have the following input:

def a():
    pass
b()
def b():
    pass

My regular expression only extracts the 'a' and doesn't continue and extract 'b'. The code I'm using is:

self.function_re = re.compile(r'def (\S+)\([^\.]*\)', re.MULTILINE)
print(self.function_re.findall(self.code))

Which outputs ['a'].

Was it helpful?

Solution

I'm guessing your pattern for the parameter list is too greedy, and matches all the way up to the last closing parenthesis in the string. Try using def (\S+)\([^\.]*?\) (note the ? qualifier after the "zero or more" quantifier for your parameter list).

OTHER TIPS

It's because the \([^\.]*\) part is greedy, ie. it matches the whole part from the first parenthesis down to the very last one:

>>> r = re.compile(r'def ([^\s]+)(\([^\.]*\))')
>>> r.findall(test)
[('a', '():\n        pass\nb()\ndef b()')]

If you make it non-greedy by appending the ? to the star, it should be all fine:

>>> r = re.compile(r'def ([^\s]+)\([^\.]*?\)')
>>> r.findall(test)
['a', 'b']
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top