Converting from wchar_t a char and vice versa

https://stackoverflow.com/questions/20936222

24-09-2022
|

Question

I am writing a template class String (just for learning purposes) and have a small problem. If T is wchar_t and U a char and vice versa, what am I missing for this method to work?

template<typename U>
String<T> operator + (const U* other)
{
    String<T> newString;
    uint32_t otherLength = length(other);
    uint32_t stringLength = m_length + otherLength;
    uint32_t totalLength = stringLength * sizeof(T) + sizeof(T);

    T *buffer = new T[totalLength];

    memset(buffer, 0, totalLength);
    memcpy(buffer, m_value, m_length * sizeof(T));
    newString.m_value = buffer;
    newString.m_length = stringLength;
    memcpy(newString.m_value + m_length, other, otherLength * sizeof(T));

    return newString;
}

Ok, Jared below suggested a solution, so something like this (there are errors, I know, just a template)?

template<typename U>
String<T> operator + (const U* other)
{
    String<T> newString;

    uint32_t sizeOfT = sizeof(T); // wchar_t is 4
    uint32_t sizeOfU = sizeof(U); // char is 1

    T* convertedString;

    int i = 0;
    while (*other != 0)
    {
        convertedString[i] = ConvertChar(*other);
        other++;
        i++;
    }

    return newString;
}

template <typename U>
T ConvertChar(U character)
{

}

Solution

Right now your code is essentially using memory copies when converting from a U* to String<T>. That's unfortunately not going to work because wchar_t and char have different memory layouts. In particular a wchar_t usually takes up 2 bytes while char is a single byte. What you need to establish here is a proper conversion function which should be applied to every item in the string

T ConvertChar(U c) { ... }

OTHER TIPS

While you could just widen when converting from char to wchar_t (i.e., use wchar_t(c)) but it is probably doing the wrong thing. When converting from wchar_t to char it is obvious that you are likely to loose information. It has become common that individual character entites actually do not represent individual characters but are actually just bytes representing UTF-8 or UTF-16. In that case the elements probably need to be encoded/decoded into the corresponding other representation. Obviously, the conversion isn't one to one: some Unicode characters consist of multiple UTF-8 bytes and multiple UTF-16 words.

You might want ot have a look at std::codecvt<...> for converting between encodings.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow