Question

Does GCC's standard library or Boost or any other library implement iostream-compliant versions of ifstream or ofstream that supports conversion between UTF-8-encoded (file-) streams and a std::vector<wchar_t> or std::wstring?

Was it helpful?

Solution

Your question doesn't quite work. UTF-8 is a specific encoding, while wchar_t is a data type. Moreover, wchar_t is intended by the standard to represent the system's character set, but this is entirely left to platform, and the standard makes no requirements.

Therefore, the correct thing to ask for is first of all conversion between the system's narrow, multibyte encoding and the fixed-length encoding of the system's encoding into a wide string. This functionality is provided by std::mbstowcs and std::wcstombs. There may also be a locale facet somewhere that wraps this, but that's a bit of a niche area of the library.

If you want to convert between the opaque "system's encoding" prescribed by the standard and a definite encoding prescribed by your serialized data source/sink, you need an extra library. I'd recommend Posix's iconv(), which is widely available. (The Windows API has a different approach and offers special functions for conversion.)

C++11 alleviates the issue slightly by adding an explicit family of UTF-encoded string types and literals, and presumably also transcoding facilities among those (though I've never seen them implemented by anyone).

Here's my standard response of past posts on the subject: Q1, Q2, Q3. C++11 will be a joy once its fully available :-)

OTHER TIPS

The C++11 solution is to wrap the UTF-8 stream in an appropriate wbuffer_convert

#include <fstream>
#include <string>
#include <codecvt>
int main()
{
    std::ifstream utf8file("test.txt"); // if the file holds UTF-8 data
    std::wbuffer_convert<std::codecvt_utf8<wchar_t>> conv(utf8file.rdbuf());
    std::wistream ucsbuf(&conv);
    std::wstring line;
    getline(ucsbuf, line); // then line holds UCS2 or UCS4, depending on the OS
}

This works with Visual Studio 2010 and with clang++/libc++, but, unfortunately, not with GCC.

Until this becomes widespread, third-party libraries are indeed the best solution.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top