Let's see if I can explain this without too many factual errors...

I'm writing a string class and I want it to use utf-8 (stored in a std::string) as it's internal storage. I want it to be able to take both "normal" std::string and std::wstring as input and output.

Working with std::wstring is not a problem, I can use std::codecvt_utf8<wchar_t> to convert both from and to std::wstring.

However after extensive googling and searching on SO I have yet to find a way to convert between a "normal/default" C++ std::string (which I assume in Windows is using the local system localization?) and an utf-8 std::string.

I guess one option would be to first convert the std::string to an std::wstring using std::codecvt<wchar_t, char> and then convert it to utf-8 as above, but this seems quite inefficient given that at least the first 128 values of a char should translate straight over to utf-8 without conversion regardless of localization if I understand correctly.

I found this similar question: C++: how to convert ASCII or ANSI to UTF8 and stores in std::string Although I'm a bit skeptic towards that answer as it's hard coded to latin 1 and I want this to work with all types of localization to be on the safe side.

No answers involving boost thanks, I don't want the headache of getting my codebase to work with it.

有帮助吗?

解决方案

If your "normal string" is encoded using the system's code page and you want to convert it to UTF-8 then this should work:

std::string codepage_str;
int size = MultiByteToWideChar(CP_ACP, MB_COMPOSITE, codepage_str.c_str(),
                               codepage_str.length(), nullptr, 0);
std::wstring utf16_str(size, '\0');
MultiByteToWideChar(CP_ACP, MB_COMPOSITE, codepage_str.c_str(),
                    codepage_str.length(), &utf16_str[0], size);

int utf8_size = WideCharToMultiByte(CP_UTF8, 0, utf16_str.c_str(),
                                    utf16_str.length(), nullptr, 0,
                                    nullptr, nullptr);
std::string utf8_str(utf8_size, '\0');
WideCharToMultiByte(CP_UTF8, 0, utf16_str.c_str(),
                    utf16_str.length(), &utf8_str[0], utf8_size,
                    nullptr, nullptr);
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top