Question

How do I convert text between multi-byte text strings, for example Simplified Chinese GB 2312, into UTF8 using c++ ?

Was it helpful?

Solution

On unix systems you'd best use the iconv library.

See iconv_open, iconv, iconv_close

You'd have to know the character encoding of course (EUC-CN, HZ).

If not on a unix system, search for some support in the OS, doing character conversions by hand is very hard to get right.

OTHER TIPS

WinAPI: MultiByteToWideChar and vice versa, WideCharToMultiByte. I can post a sample later.

However, UTF-8 is rather tricky to represent and more specifically, to use, in applications. The MultiByteToWideChar function converts a string to UTF-16 (UCS2). I suggest you use this format in your software internally, and only convert it to UTF-8 using WideCharToMultiByte if your program needs to produce such output. This is the standard way of doing internationalization/unicode on Windows & OS X.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top