The following does work:
In [49]: re.sub(r'href="([^"]*?)([.]php)?"', r'href="\1.php"', 'href="url.php"')
Out[49]: 'href="url.php"'
In [50]: re.sub(r'href="([^"]*?)([.]php)?"', r'href="\1.php"', 'href="url"')
Out[50]: 'href="url.php"'
The reason your original regex (.+?)(?!php)
doesn't quite work is that it matches url.php
as follows:
(.+?)
matchesurl.php
;- at this point the negative lookahead is satisfied since the next character is a double quote.
In other words, .+?
consumes the entire filename including the extension, making the lookahead a no-op.