Question

I need to input special characters through stdin and there seems to be a problem doing this. I guess fgetws() doesn't support cp852 (standard console code page of my OS, Win 7 x64 btw). Should I use cp1250 or something else? I tried using chcp 1250 in cmd.exe but that lasts only until I close down the command prompt. I am on Visual C.

#include <stdio.h>
#include <locale.h>

int main()
{
    wchar_t query[64];
    setlocale(LC_ALL, "croatian");
    wprintf(L"Insert special characters: ");
    fgetws(query, 64, stdin);
    fputws(query, stdout);
    putchar('\n');
    return 0;
} 

Console output example

Was it helpful?

Solution

Trying to fix wrong locale settings in your program is the wrong place. You should just set the right locale in your environment and using this value in your code like:

setlocale(LC_ALL, "");

This is what the man-page tells:

On startup of the main program, the portable "C" locale is selected as default. A program may be made portable to all locales by calling:

setlocale(LC_ALL, "");

EDIT:

Taking a look at your last screenshots to me it seems there's something mixed up while reading input.

case 1: (the one without any call to setlocale)

... doesn't seem too interesting. As in the (default) "C" locale just contains characters U+00-U+7E even if it seems to produce the correct result this is more or less the garbage in - garbage out case. The value 0x9F is the code page 825 (see: http://de.wikipedia.org/wiki/Codepage_852) encoding Unicode Character 'LATIN SMALL LETTER C WITH CARON' (U+010D) č.

Passing the raw value back and forth there's no big surprise the same output is generated if the same byte is written to terminal again.

case 2:

... looks a little bit more interesting. The value 0x17a is the UTF-16 encoding of the unicode character 'LATIN SMALL LETTER Z WITH ACUTE' (U+017A) ź, which perfectly matches the output shown in your screenshot. As fputsw seems to corectly map this to the terminal encoding it seems the problem is that the input isn't properly read.

Just to make sure after making changes nothing got confused - you're running the code like this?

#include <stdio.h>
#include <locale.h>

int main () {
    wchar_t query[64];
    setlocale (LC_ALL, "");

    if (fgetws(query, 64, stdin) == NULL)
      return -1;
    fputws(query, stdout);
    putchar('\n');

    return 0;
}

EDIT:

Locale settings check

I forgot to mention one of the most interesting things about your test: The unicode character 'LATIN SMALL LETTER Z WITH ACUTE' (U+017A) ź (the one output in your second screenshot) is exactly represented as the value 0x9f (that's the one you get reported, when using the "raw" character code) in codepage 1250.

Somehow fgetws seems to interpret character codes using codepage 1250 instead of codepage 825.

As to me it still seems the problem is that the locale settings somehow get mixed up. Probably you should try running the following code and see what locale gets reported.

#include <locale.h>
#include <stdio.h>

int main (int argc, char *argv[]) {
  char *locale;

  setlocale (LC_ALL, "");
  if ((locale = setlocale (LC_ALL, NULL)) == NULL)
    return -1;

  printf ("%s\n", locale);

  return 0;
}

On my system for example this gives output: es_ES.utf8 The interesting part is the one after the dot '.' as this specifies the character encoding (utf8 in the example given above).

Another thing to check might be the program version of Visual Studio you are using, as there seems to be a bug in older versions setting the default locale. (see: http://connect.microsoft.com/VisualStudio/feedback/details/709505/setlocale-lc-all-returns-incorrect-default-system-locale)

OTHER TIPS

Thank you mikyra, for trying to help. Solved by explicitly giving setlocale the default code page of my console as a second argument. Like this:

setlocale( LC_ALL, ".852" );

Hopefully, no new problems will arise. Thank you, MSDN.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top