Word Boundary Regular Expression Unless Inside HTML Tag

https://stackoverflow.com/questions/17141364

31-05-2022
|

Question

I have a regular expression using word boundaries that works exceedingly well...

~\b('.$value.')\b~i

...save for the fact that it matches text inside HTML tags (i.e. title="This is blue!"). It's a problem because I'm doing text substitution on anything the regex matches, then making tooltips appear using those title tags. So, as you can imagine, it's substituting text inside the title and breaking the HTML of the tooltip. For example, what should be:

Aqua

...ends up becoming...

Royal Blue">Aqua

My use of strip_tags didn't solve the issue; I think what I need is a better regular expression which simply will not match content ending in blue"> ('blue' in this case being placeholder for any other color in the array I'm comparing it against).

Can anyone append what I need to the regular expression? Or do you have a better solution?

Solution 2

Regex replaces often seem like the solution but they can have a lot of ill side-effects, and not really accomplish what you want. Look into DOMDocument models instead (as some commenters have suggested).

But if you insist on using regex, here's a good post on SO. It uses two passes to accomplish what you want.

OTHER TIPS

Davey, resurrecting this question because apart from the Dom solution, there is a better regex solution than the one mentioned so far. It's a simple solution that requires a single step.

The general solution is

<[^>]*>(*SKIP)(*F)|blue

Here's a demo

Any content within <> tags is simply skipped. Content in between tags, such as blue is matched, which sounds like it fits your needs.

In the expression, replace "blue" for what you like.

Reference

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow