If the input is utf8-encoded, might use unicode regex to match/strip invisible control characters like e2808e (left-to-right-mark). Use u (PCRE_UTF8)
modifier and \p{C}
or \p{Other}
.
Strip out all invisibles:
$str = preg_replace('/\p{C}+/u', "", $str);
Here is a list of \p{Other}
Detect/identify invisibles:
$str = ".\xE2\x80\x8E.\xE2\x80\x8B.\xE2\x80\x8F";
// get invisibles + offset
if(preg_match_all('/\p{C}/u', $str, $out, PREG_OFFSET_CAPTURE))
{
echo "<pre>\n";
foreach($out[0] AS $k => $v) {
echo "detected ".bin2hex($v[0])." @ offset ".$v[1]."\n";
}
echo "</pre>";
}
outputs:
detected e2808e @ offset 1
detected e2808b @ offset 5
detected e2808f @ offset 9
To identify, look up at Google e.g. fileformat.info:
@google: site:fileformat.info e2808e