To explain the regex you have:
/ # Starting regex delimiter
<img # Match <img
[^>]+ # Match one or more characters that aren't a >
> # Match a >
/ # Ending regex delimiter
i # Case-insensitive option
How does it work?
Imagine what an img
tag looks like. It starts with <img
and ends with >
. So once we've identified an <img
tag, we need to match everything until the nearest >
.
That means we need to match as many characters as we can, as long as they are not a >
. And that's exactly what [^>]+
does. Since there needs to be at least one of those characters (<img>
is not legal), we use a +
instead of the "zero or more" *
.
You might see a problem here: What if the tag does contain a >
somewhere, e. g. in an attribute? And there you have one of the reasons why using regexes to parse HTML is fraught with peril.