Question

I'm using TextWrangler grep to perform find/replace on multiple files and have run into a wall with the last find/replace I need to perform. I need to match any text between "> and the first instance of a <br /> in a line but the match cannot contain the character sequence [xcol]. The regex flavor is Perl-Compatible (PCRE) so lookbehind needs to be fixed-length.

Example Text to Search:

<p class="x03">FooBar<br />Bar</p>
<p class="x03">FooBar [xcol]<br />Bar</p>
<p class="x06">Hello World<br />[xcol]foo[xcol]bar<br /></p>
<p class="x07">Hello World[xcol]<br />[xcol]foo[xcol]bar<br /></p>  

Desired behavior of regex:
1st Line match ">FooBar<br />
2nd Line no match
3rd Line match ">Hello World<br />
4th Line no match

The text between "> and the <br /> will be captured in a group to be used with the replace function. The closest I got was using the following regex with negative lookahead, but this will not match the 3rd line as desired:

">((?!.*?\[xcol]).*?)<br />

Any help or advice is appreciated. Thank you.

Was it helpful?

Solution

Try this regex:

">((?!\[xcol]).)*<br\s*/>

A (short) explanation:

">               # match '">'
(                # start group 1
  (?!\[xcol]).   #   if '[xcol]' can't be seen ahead, match any character (except line breaks)
)                # end group 1
*                # repeat group 1 zero or more times
<br\s*/>         # match '<br />'

If you need to match line breaks for . as well, either enable DOT-ALL (add (?s) before the .) or replace the . with something like [\s\S]

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top