There is no reliable way to detect, when given as input an array of 8 bit characters, which 8 bit encoding has been used for those characters.
How to detect Windows-1251 encoded characters [duplicate]
-
02-06-2022 - |
Question
Is there a proper way to detect the Windows-1251
encoded characters ?
IMO, unlike multiple-byte native characters, Windows-1251
is an 8-bit character encoding, so it's impossible to distinguish it from other 8-bit native characters like latin1
. If I am wrong on this, please correct me.
The first clue to me is locale
, I take all the non-ascii
characters as Windows-1251
if the locale is ru
.
Are there any better ways ?
UPDATE:
Here is the context of my question, there are some Windows-1251
encoded characters in the ID3
info of a MP3 files, I have to detect the Windows-1251
encoded characters and then convert them to UTF-16 using icu4c
, otherwise those Windows-1251
encoded characters will represented unreadable on my system(Android
). I deem maybe some of you have better ways.
Solution 2
OTHER TIPS
The GetACP
function can be used to determine this. It returns the identifier of the ANSI code page that is currently active for the system.
The documented list of code page identifiers can be found here. The one you're looking for is 1251
, which corresponds to the "ANSI Cyrillic (Windows)" code page.
Very simple to use from code; e.g. in C:
#include <Windows.h>
int main()
{
if (GetACP() == 1251)
{
MessageBoxW(NULL,
L"Your system uses the ANSI Cyrillic code page.",
L"Code Page Detection",
MB_OK | MB_ICONINFORMATION);
}
return 0;
}