Question

I have been writing a new command line application in C++. One platform we support is, of course, Windows.

The Windows console, by default, uses the OEM code pages depending on the locale (for example, on my machine it is CP437 / DOS.Western). I think, if it was a Windows Cyrillic version, it would have been CP866, and so on. These OEM code pages contain only 256 characters)

I think what this means is the Windows console translates the input key strokes into characters based on the default code page. (And, depending on the currently selected fonts, if there is a corresponding glyph, it is displayed).

  1. In such a case, whether does it makes sense to use wmain/wchar_t and wide char types in my application?
  2. Is there any advantage of using wide types? Or is there any grave problem if just char * is used?
  3. When wide char types are used, what is the encoding of the command line arguments and environment strings - (wchar_t * argv[] and wchar_t * envp[]), i mean. Are they converted to UTF-16 by Windows CRT, or are they untouched?

Thanks for your contributions.

Was it helpful?

Solution

You seem to be assuming that Windows internally works in the specified codepage. That's not true. Windows internally works in Unicode (UTF-16). For legacy software that uses char instead of wchar_t, input and output are translated into the specified codepage.

I think what this means is the Windows console translates the input key strokes into characters based on the default code page

This is not correct. The mapping of key strokes to (Unicode) characters is defined by the keyboard layout. This is totally independent of the code page. E.g you could use a Chinese keyboard layout on a system using a Cyrillic code page.

  1. Not only makes it totally sense to usewchar_t, it is the recommended way.
  2. Yes, there is an advantage: your program can process all characters supported by Windows. If you use char, you can't handle any characters that are not in the current code page.
  3. They are not converted - they stay what they are, namely UTF-16 characters.

Unfortunately, the command prompt itself is an 'ANSI' application, so it suffers from all of the limitations of 'ANSI', and this affects your application if you use it from the command prompt. However, a console application can be used in other ways, without a command prompt window, and then it can support Unicode fully.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top