The first (.*?)
will match between >
and I
and since it's lazy, it'll test the next part of the regex immediately: (?P<color>red)?
but there's no red
at that point, so the 0
option of ?
'activates' and the regex continues to the next part, which is (.*?)
. It'll again match the part between >
and I
and since it's lazy, it'll check the next part of the regex: <\/span>
(I'm taking it as a whole).
So the second (.*?)
will match all the way there.
Indeed, your results[1]
will be null, as will be results[color]
(I don't remember if you have to quote color
or not) and results[3]
will contain I love my red car.
.
Hmm, one workaround is to use OR like NickC mentioned in his answer. Another you might use is by using a negative lookahead to check for each character:
<span>((?:(?!\bred\b).)*(?<colour>\bred\b)?.*)<\/span>
As a side note, I would advise using the word boundaries so that you don't match things like reduce
or jarred
.