Question

I want to match a closing tag followed by an 0+ spaces/newlines followed by an opening tag when followed by a lowercase letter. Examples:

  • text</p> <p>blah matches </p> <p>
  • text</i><i>and more text <b>but not this</b> matches </i><i>
  • text</i> <i>And more text does not match

I tried this: </.*?>\s*\n*\s*<.*>(?=[a-z]), but it doesn't work for the second example, as it will match </i><i> and more text </b> even though the question mark should make it "lazy".

Was it helpful?

Solution

Try:

</[^>]+>\s*<[^/>]+>(?=[a-z])

Change the '+' to '*' if you want to be able to match empty tags

OTHER TIPS

Making a quantifier lazy only makes the regex try the shortest possible match first, but if that doesn't work, it will gladly expand the match until the entire regex succeeds.

You need to be more specific in what you allow to match - for example by not allowing angle brackets inside a tag:

</[^<>]*>\s*<[^/][^<>]*>(?=[a-z])

(Also, \s already contains \n, so \s*\n*\s* can be shortened to \s*)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top