Question

I need a RegEx that finds extraneous <br /> tags that occur before block tags, leaving all other <br /> tags intact.

Here's the text I am searching:

<div>some text<br id="first"/>some more text<br id="second"/></div>

However, when using the following RegEx:

</? *br.*?>(?=</? *([^(br)]).*?)

It selects everything past the first <br /> tag like so:

<br id="first"/>some more text<br id="second"/>

... Which isn't what I want. How can I modify the expression so it only selects <br id="second"/>?

Notes: All inline tags except <br /> tags are stripped out before this point, so they won't be a factor. Also, I am using Obj-C/Cocoa so I can't use all those fancy PHP functions. :). Also, this will be a valid XHTML doc.

Was it helpful?

Solution

<br[^<>]*>(?=\s*<(?!br))

should do what you want. (See it here)

Explanation of the regex:

<br     # Match <br
[^<>]*  # followed by any number of non-bracket characters
>       # and a >.
(?=     # Assert that we are right before...
 \s*    # optional whitespace,
 <      # followed by any tag
 (?!br) # except br
)       # (End of lookahead)

Some comments:

  • I've removed the optional slashes from your regex because </br> doesn't exist in HTML or XHTML.
  • I've also removed the optional spaces at the start of the tags because there may be no whitespace between < and the tag name (nor may there be whitespace between / and >).
  • As an aside: In valid XHTML, <br /> is the only legal form; <br id="foo" /> is invalid.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top