Question

How can I print (cout / wcout / ...) char32_t to console in C++11?

The following code prints hex values:

u32string s2 = U"Добрый день";
for(auto x:s2){
    wcout<<(char32_t)x<<endl;
}
Was it helpful?

Solution

First, I don't think wcout is supposed to print as characters anything but char and wchar_t. char32_t is neither.

Here's a sample program that prints individual wchar_t's:

#include <iostream>

using namespace std;

int main()
{
  wcout << (wchar_t)0x41 << endl;
  return 0;
}

Output (ideone):

A

Currently, it's impossible to get consistent Unicode output in the console even in major OSes. Simplistic Unicode text output via cout, wcout, printf(), wprintf() and the like won't work on Windows without major hacks. The problem of getting readable Unicode text in the Windows console is in having and being able to select proper Unicode fonts. Windows' console is quite broken in this respect. See this answer of mine and follow the link(s) in it.

OTHER TIPS

I know this is very old, but I had to solve it on my own and there you go. The idea is to switch between UTF-8 and UTF-32 encodings of Unicode: you can cout u8 strings, so just translate the UTF-32 encoded char32_t to it and you're done. Those are the low level functions I came up with (no Modern C++). Probably those can be optimized, also: any suggestion is appreciated.

char* char_utf32_to_utf8(char32_t utf32, const char* buffer)
// Encodes the UTF-32 encoded char into a UTF-8 string. 
// Stores the result in the buffer and returns the position 
// of the end of the buffer
// (unchecked access, be sure to provide a buffer that is big enough)
{
    char* end = const_cast<char*>(buffer);
    if(utf32 < 0x7F) *(end++) = static_cast<unsigned>(utf32);
    else if(utf32 < 0x7FF) {
        *(end++) = 0b1100'0000 + static_cast<unsigned>(utf32 >> 6);
        *(end++) = 0b1000'0000 + static_cast<unsigned>(utf32 & 0b0011'1111);
    }
    else if(utf32 < 0x10000){
        *(end++) = 0b1110'0000 + static_cast<unsigned>(utf32 >> 12);
        *(end++) = 0b1000'0000 + static_cast<unsigned>((utf32 >> 6) & 0b0011'1111);
        *(end++) = 0b1000'0000 + static_cast<unsigned>(utf32 & 0b0011'1111);
    } else if(utf32 < 0x110000) {
        *(end++) = 0b1111'0000 + static_cast<unsigned>(utf32 >> 18);
        *(end++) = 0b1000'0000 + static_cast<unsigned>((utf32 >> 12) & 0b0011'1111);
        *(end++) = 0b1000'0000 + static_cast<unsigned>((utf32 >> 6) & 0b0011'1111);
        *(end++) = 0b1000'0000 + static_cast<unsigned>(utf32 & 0b0011'1111);
    }
    else throw encoding_error(end);
    *end = '\0';
    return end;
}

You can implement this function in a class if you want, in a constructor, in a template, or whatever you prefer.

Follows the overloaded operator with the char array

std::ostream& operator<<(std::ostream& os, const char32_t* s)
{
    const char buffer[5] {0}; // That's the famous "big-enough buffer"
    while(s && *s)
    {
        char_utf32_to_utf8(*(s++), buffer);
        os << buffer;
    }
    return os;
}

and with the u32string

std::ostream& operator<<(std::ostream& os, const std::u32string& s)
{
    return (os << s.c_str());
}

Running the simplest stupidest test with the Unicode characters found on Wikipedia

int main()
{
    std::cout << std::u32string(U"\x10437\x20AC") << std::endl;
}

leads to 𐐷€ printed on the (Linux) console. This should be tested with different Unicode characters, though...

Also this varies with endianness but I'm sure you can find the solution looking at this.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top