Regex to excluding matches within the title attribute

Question 1

Based on your comments, clarifying that the fee codes are never found within a tag, I'd suggest a two pass solution. First, remove all tags by replacing them with a single space. Then process that to find the fee codes.

$content = preg_replace("/<[^>]+>/", " ", $content);
preg_match_all("/\b[A-Za-z]\d{5}\b/", $content, $matches);

This assumes no stray < or > is present.

Of course, the usual warning that one should not use regex to parse html or xml, applies.

Question 2

PHP had (*SKIP)(*FAIL) Magic

Resurrecting this question because it had a simple solution that wasn't mentioned. This problem is a classic case of the technique explained in this question to "regex-match a pattern, excluding..."

With all the warnings about using regex to parse html, here is a simple way to do it.

We can solve it with one single and simple regex:

(?i)<[^>]+(*SKIP)(*F)|[a-z]?\d{5}

See demo.

The left side of the alternation | matches complete <tags> then deliberately fails, after which the engine skips to the next position in the string. The right side matches the pattern you want, and we know they are the right ones because they were not matched by the expression on the left.

Sample Code

$regex = '~(?i)<[^>]+(*SKIP)(*F)|[a-z]?\d{5}~';
preg_match_all($regex, $yourstring, $matches);
print_r($matches[0]);

Reference