With regex you could use a lookbehind and lookahead for finding duplicates:
$pattern = '/(?<=class=")(?:([-\w]+) (?=\1[ "]))+/i';
This would replace multiple instances of capture group 1 ([-\w]+)
in a sequence.
$str = '<li class="active active">';
echo preg_replace($pattern, "", $str);
output:
<li class="active">
Test at regex101
EDIT 08.04.2014
To remove duplicates, that are not directly after the lookbehind (?<=class=")
...
The problem is, that a lookbehind assertion can only be of fixed length. so something like (?<=class="[^"]*?)
is not possible. As an alternative \K could be used, which resets the beginning of the match. A pattern could be:
$pattern = '/class="[^"]*?\K(?<=[ "])(?:([-\w]+) (?=\1[ "]))+/i';
You could imagine everything before \K
as a virtual lookbehind of variable length.
This regex, as the first one, would only replace multiple instances of one duplicate in a sequence.
EDIT 11.09.2014
Finally I think a single regex, that would strip out all of different duplicates is getting rather complex:
/(?>(?<=class=")|(?!^)\G)(?>\b([-\w]++)\b(?=[^"]*?\s\1[\s"])\s+|[-\w]+\s+\K)/
This one uses continuous matching, as soon class="
is found.
Test at regex101; Also see SO Regex FAQ
A more simple way using regex would be a preg_replace_callback():
$html = '<li class="a1 a1 li li-home active li li active a1">';
$html = preg_replace_callback('/\sclass="\K[^"]+/', function ($m) {
return trim(implode(" ",array_unique(preg_split('~\s+~', $m[0]))));
}, $html);
Note that older PHP-versions don't support anonymous functions (if so, change to a normal function).