Matching a pair of comments in HTML using regular expressions
-
09-09-2019 - |
Question
I have a mako template that looks something like this:
% if staff:
<!-- begin staff -->
...
<!-- end staff -->
% endif
That way if I pass the staff variable as being True, those comments should appear. I'm trying to test this by using a regular expression that looks like this:
re.search('<!-- begin staff -->.*<!-- end staff -->', text)
I've verified that the comments appear in the HTML output, but the regular expression doesn't match. I've even tried putting the comments (<!-- begin staff -->
and <!-- end staff -->
) through re.escape, but still no luck. What am I doing wrong?
Or is there a better way to run this test?
Solution
By default .
doesn't match newlines - you need to add the re.DOTALL
option.
re.search('<!-- begin staff -->.*<!-- end staff -->', text, re.DOTALL)
If you have more than one staff section, you might also want to make the match ungreedy:
re.search('<!-- begin staff -->.*?<!-- end staff -->', text, re.DOTALL)
OTHER TIPS
Use an HTML Parser like HTMLParser instead. See Can you provide some examples of why it is hard to parse XML and HTML with a regex? for why.