You need to switch on newline matching for .
; it does not match a newline otherwise:
re.search(r'\{cite web(.*?)([\r\n]+)(.*?)\}\}', inputtext, flags=re.DOTALL)
You have multiple newlines spread throughout the text you want to match, so matching just one set of consecutive newlines is not enough.
From the re.DOTALL
documentation:
Make the
'.'
special character match any character at all, including a newline; without this flag,'.'
will match anything except a newline.
You could use one re.sub()
call to remove all newlines within the cite
stanza in one go, without a loop:
re.sub(r'\{cite web.*?[\r\n]+.*?\}\}', lambda m: re.sub('\s*[\r\n]\s*', '', m.group(0)), inputtext, flags=re.DOTALL)
This uses a nested regular expression to remove all whitespace with at least one newline in it from the matched text.
Demo:
>>> import re
>>> inputtext = '''\
... {{cite web
... |title=Testing
... |url=Testing
... |editor=Testing
... }}
... '''
>>> re.search(r'\{cite web(.*?)([\r\n]+)(.*?)\}\}', inputtext, flags=re.DOTALL)
<_sre.SRE_Match object at 0x10f335458>
>>> re.sub(r'\{cite web.*?[\r\n]+.*?\}\}', lambda m: re.sub('\s*[\r\n]\s*', '', m.group(0)), inputtext, flags=re.DOTALL)
'{{cite web|title=Testing|url=Testing|editor=Testing}}\n'