Question

The mbstring PHP module has a strict_detection setting, documented here. Unfortunately, the manual is completely useless; it only says that this option "enables the strict encoding detection".

I did a few tests and could not find how any of the mbstring functions are affected by this. mb_check_encoding() and mb_detect_encoding() give exactly the same result for both valid and invalid UTF-8 input.

(edit:) The mbstring.strict_detection option was added in PHP 5.1.2.

Was it helpful?

Solution

Without the strict parameter being set, the encoding detection is faster but will not be as accurate. For example, if you had a UTF-8 string with partial UTF-8 sequence like this:

$s = "H\xC3\xA9ll\xC3";
$encoding = mb_detect_encoding($s, mb_detect_order(), false);

The result of the mb_detect_encoding call would still be "UTF-8" even though it's not valid UTF-8 (the last character is incomplete).

But if you set the strict parameter to true...

$s = "H\xC3\xA9ll\xC3";
$encoding = mb_detect_encoding($s, mb_detect_order(), true);

It would perform a more thorough check, and the result of that call would be FALSE.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top