The reason why you don't see words that begins (or finish) with an "utf-8" character is simple: \b
is a word boundary that is by default a limit between a character from (and only from) \w
(or [a-zA-Z0-9_]
) and another character.
To change the behaviour of \b
(to get it works with all numbers and all letters of the galaxy), you must use the u modifier. With this modifier \w
contains now all letters and all numbers:
preg_match_all("/(*UTF8)\b(" . implode($find,"|") . ")\b/iu", $str, $matches);
another way is to replace word boundaries with lookarounds:
preg_match_all("/(*UTF8)(?<=^|[\s\pP])(" . implode($find,"|") . ")(?=[\s\pP]|$)/i", $str, $matches);