mb_detect_encoding showing the same encoding

https://stackoverflow.com/questions/18126452

23-06-2022
|

Question

I have a weird problem , the following code :

$str = "נסיון" // <--- Hebrew chars
echo mb_detect_encoding ($str)."<br><br><br>";
$str = iconv (mb_detect_encoding($str),'UCS-2BE',$str);
echo mb_detect_encoding ($str)."<br><br><br>";

This will output :

UTF-8

This code is written in a file that's encoded (using Notepad++) in UTF-8 Without BOM, trying other encodings and didn't work.

I also tried converting the string using :

$str = mb_convert_encoding($str,'UCS-2BE');

But that didn't work either. Any insights?

La solution

From the documentation for mb_detect_order, the function that establishes the order in which mb_detect_encoding tests different encodings:

mbstring currently implements the following encoding detection filters. If there is an invalid byte sequence for the following encodings, encoding detection will fail. UTF-8, UTF-7, ASCII, EUC-JP,SJIS, eucJP-win, SJIS-win, JIS, ISO-2022-JP

For ISO-8859-*, mbstring always detects as ISO-8859-*.

For UTF-16, UTF-32, UCS2 and UCS4, encoding detection will fail always.

So, you can't detect the encoding of the second string with the mb functions.

Licencié sous: CC-BY-SA avec attribution

Non affilié à StackOverflow