Question

Hi I have a few typedefs:

typedef unsigned char Byte;
typedef std::vector<Byte> ByteVector;
typedef std::wstring String;

I need to convert String into ByteVector, I have tried this:

String str = L"123";
ByteVector vect(str.begin(), str.end());

As a result vectror contains 3 elements: 1, 2, 3. However it is wstring so every charcter in this string is wide so my expected result would be: 0, 1, 0, 2, 0, 3.

Is there any standart way to do that or I need to write some custom function.

Was it helpful?

Solution

Byte const* p = reinterpret_cast<Byte const*>(&str[0]);
std::size_t size = str.size() * sizeof(str.front());
ByteVector vect(p, p+size);

OTHER TIPS

What is your actual goal? If you just want to get the bytes representing the wchar_t objects, a fairly trivial conversion would do the trick although I wouldn't use just a cast to to unsigned char const* but rather an explicit conversion.

On the other hand, if you actually want to convert the std::wstring into a sequence encoded using e.g. UTF8 or UTF16 as is usually the case when dealing with characters, the conversion used for the encoding becomes significantly more complex. Probably the easiest approach to convert to an encoding is to use C's wcstombs():

std::vector<char> target(source.size() * 4);
size_t n = wcstombs(&target[0], &source[0], target.size());

The above fragment assumes that source isn't empty and that the last wchar_t in source is wchar_t(). The conversion uses C's global locale and assumes to convert whatever character encoding is set up there. There is also a version wcstombs_l() where you can specify the locale.

C++ has similar functionality but it is a bit harder to use in the std::codecvt<...> facet. I can provide an example if necessary.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top