Question

I want to match the docstrings of a Python file. Eg.

r""" Hello this is Foo
     """

Using only """ should be enough for the start.

>>> data = 'r""" Hello this is Foo\n     """'
>>> def display(m):
...     if not m:
...             return None
...     else:
...             return '<Match: %r, groups=%r>' % (m.group(), m.groups())
...
>>> import re
>>> print display(re.match('r?"""(.*?)"""', data, re.S))
<Match: 'r""" Hello this is Foo\n     """', groups=(' Hello this is Foo\n     ',)>
>>> print display(re.match('r?(""")(.*?)\1', data, re.S))
None

Can someone please explain to me why the first expression matches and the other does not?

Was it helpful?

Solution

You are using the escape sequence \1 instead of the backreference \1.

You can fix this by changing to escaping the \ before 1.

print display(re.match('r?(""")(.*?)\\1', data, re.S))

You can also fix it by using a raw string for your regex, with no escape sequences.

print display(re.match(r'r?(""")(.*?)\1', data, re.S))

OTHER TIPS

I think you might be missing the re.DOTALL or re.MULTILINE flags. In this case a re.DOTALL should allow your regex .*? to match newlines as well

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top