Question

I've read Wikipedia's article on Windows-1252 character encoding. For characters whose byte value is < 128, it should be the same as ASCII/UTF-8.

This makes sense:

php -r "var_export(mb_detect_encoding(\"\x92\", 'windows-1252', true));" 'Windows-1252'

A left curly apostrophe is detected properly.

php -r "var_export(mb_detect_encoding(\"a\", 'windows-1252', true));" false

Huh? The letter "a" isn't Windows-1252?

My terminal, where I"m running this, is set to UTF-8. So that should be the same byte sequence as ASCII for the letter 'a'. For the sake of minimizing the variables, if I specify the right Windows-1252 byte sequence:

php -r "var_export(mb_detect_encoding(\"\x61\", 'windows-1252', true));" false

Changing the "strict" parameter (which has pretty useless documentation) does nothing in these cases.

Was it helpful?

Solution

Encoding detection is not supported for windows-1252. According to the mb_detect_order documentation:

mbstring currently implements the following encoding detection filters. If there is an invalid byte sequence for the following encodings, encoding detection will fail.

UTF-8, UTF-7, ASCII, EUC-JP,SJIS, eucJP-win, SJIS-win, JIS, ISO-2022-JP

For ISO-8859-, mbstring always detects as ISO-8859-.

For UTF-16, UTF-32, UCS2 and UCS4, encoding detection will fail always.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top