The conversion static_cast<char>(uc)
where uc
is of type is unsigned char
is always valid: according to 3.9.1 [basic.fundamental] the representation of char
, signed char
, and unsigned char
are identical with char
being identical to one of the two other types:
Objects declared as characters (char) shall be large enough to store any member of the implementation’s basic character set. If a character from this set is stored in a character object, the integral value of that character object is equal to the value of the single character literal form of that character. It is implementation-defined whether a char object can hold negative values. Characters can be explicitly declared unsigned or signed. Plain char, signed char, and unsigned char are three distinct types, collectively called narrow character types. A char, a signed char, and an unsigned char occupy the same amount of storage and have the same alignment requirements (3.11); that is, they have the same object representation. For narrow character types, all bits of the object representation participate in the value representation. For unsigned narrow character types, all possible bit patterns of the value representation represent numbers. These requirements do not hold for other types. In any particular implementation, a plain char object can take on either the same values as a signed char or an unsigned char; which one is implementation-defined.
Converting values outside the range of unsigned char
to char
will, of course, be problematic and may cause undefined behavior. That is, as long as you don't try to store funny values into the std::string
you'd be OK. With respect to bit patterns, you can rely on the n
th bit to translated into 2n
. There shouldn't be a problem to store binary data in a std::string
when processed carefully.
That said, I don't buy into your premise: Processing binary data mostly requires dealing with bytes which are best manipulated using unsigned
values. The few cases where you'd need to convert between char*
and unsigned char*
create convenient errors when not treated explicitly while messing up the use of char
accidentally will be silent! That is, dealing with unsigned char
will prevent errors. I also don't buy into the premise that you get all those nice string functions: for one, you are generally better off using the algorithms anyway but also binary data is not string data. In summary: the recommendation for std::vector<unsigned char>
isn't just coming out of thin air! It is deliberate to avoid building hard to find traps into the design!
The only mildly reasonable argument in favor of using char
could be the one about string literals but even that doesn't hold water with user-defined string literals introduced into C++11:
#include <cstddef>
unsigned char const* operator""_u (char const* s, size_t)
{
return reinterpret_cast<unsigned char const*>(s);
}
unsigned char const* hello = "hello"_u;