How to detect Windows-1251 encoded characters [duplicate]

https://stackoverflow.com/questions/17544426

02-06-2022
|

Question

Is there a proper way to detect the Windows-1251 encoded characters ?

IMO, unlike multiple-byte native characters, Windows-1251 is an 8-bit character encoding, so it's impossible to distinguish it from other 8-bit native characters like latin1. If I am wrong on this, please correct me.

The first clue to me is locale, I take all the non-ascii characters as Windows-1251 if the locale is ru.

Are there any better ways ?

UPDATE:

Here is the context of my question, there are some Windows-1251 encoded characters in the ID3 info of a MP3 files, I have to detect the Windows-1251 encoded characters and then convert them to UTF-16 using icu4c , otherwise those Windows-1251 encoded characters will represented unreadable on my system(Android). I deem maybe some of you have better ways.

Solution 2

There is no reliable way to detect, when given as input an array of 8 bit characters, which 8 bit encoding has been used for those characters.

OTHER TIPS

The GetACP function can be used to determine this. It returns the identifier of the ANSI code page that is currently active for the system.

The documented list of code page identifiers can be found here. The one you're looking for is 1251, which corresponds to the "ANSI Cyrillic (Windows)" code page.

Very simple to use from code; e.g. in C:

#include <Windows.h>

int main()
{
    if (GetACP() == 1251)
    {
        MessageBoxW(NULL,
                    L"Your system uses the ANSI Cyrillic code page.",
                    L"Code Page Detection",
                    MB_OK | MB_ICONINFORMATION);
    }
    return 0;
}

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow