Question

I have a Unicode string stored in CString and I need to know the number bytes this string takes in UTF-8 encoding. I know CString has a method getLength(), but that returns number of characters, not bytes.

I tried (beside other things) converting to char array, but I get (logically, I guess) only array of wchar_t, so this doesn't solve my problem.

To be clear about my goal. For the input lets say "aaa" I want "3" as output (since "a" takes one byte in UTF-8). But for the input "āaa", I'd like to see output "4" (since ā is two byte character).

I think this has to be quite common request, but even after 1,5 hours of search and experimenting, I couldn't find the correct solution.

I have very little experience with Windows programming, so maybe I left out some crucial information. If you feel like that, please let me know, I'll add any information you request.

Was it helpful?

Solution

As your CString contains a series of wchar_t, you can just use WideCharToMultiByte with the output charset as CP_UTF8. The function will return the number of bytes written to the output buffer, or the length of the UTF-8 encoded string

LPWSTR instr;
char outstr[MAX_OUTSTR_SIZE];
int utf8_len = WideCharToMultiByte(CP_UTF8, 0, instr, -1, outstr, MAX_OUTSTR_SIZE, NULL, NULL);

If you don't need the output string, you can simply set the output buffer size to 0

  • cbMultiByte

    Size, in bytes, of the buffer indicated by lpMultiByteStr. If this parameter is set to 0, the function returns the required buffer size for lpMultiByteStr and makes no use of the output parameter itself.

In that case the function will return the number of bytes in UTF-8 without really outputting anything

int utf8_len = WideCharToMultiByte(CP_UTF8, 0, instr, -1, NULL, 0, NULL, NULL);

If your CString is really CStringA, i.e. _UNICODE is not defined, then you need to use Multi­Byte­To­Wide­Char to convert the string to UTF-16 and then convert from UTF-16 to UTF-8 with Wide­Char­To­Multi­byte. See How do I convert an ANSI string directly to UTF-8? But new code should never be compiled without Unicode support anyway

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top