Truncating an int to char - is it defined?

https://stackoverflow.com/questions/5881895

28-10-2019
|

Question

unsigned char a, b;
b = something();
a = ~b;

A static analyzer complained of truncation in the last line, presumably because b is promoted to int before its bits are flipped and the result will be of type int.

I am only interested in the last byte of the promoted int - if b was 0x55, I need a to be 0xAA. My question is, does the C spec say anything about how the truncation happens, or is it implementation defined/undefined? Is it guaranteed that a will always get assigned the value I expect or could it go wrong on a conforming platform?

Of course, casting the result before assigning will silence the static analyzer, but I want to know if it is safe to ignore this warning in the first place.

Solution

The truncation happens as described in 6.3.1.3/2 of the C99 Standard

... if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.

Example for CHAR_BIT == 8, sizeof (unsigned char) == 1, sizeof (int) == 4

So, 0x55 is converted to int, to 0x00000055, then negated to 0xFFFFFFAA, and

      0xFFFFFFAA
    + 0x00000100 /* UCHAR_MAX + 1 */
    ------------
      0xFFFFFEAA

    ... repeat lots and lots of times ...

      0x000000AA

or, as plain 0xAA, as you'd expect

OTHER TIPS

The C standard specifies this for unsigned types:

A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.

In this case, if your unsigned char is 8 bits, it means that the result will be reduced modulo 256, which means that if b was 0x55, a will indeed end up as 0xAA.

But note that if unsigned char is wider than 8 bits (which is perfectly legal), you will get a different result. To ensure that you will portably get 0xAA as the result, you can use:

a = ~b & 0xff;

(The bitwise and should be optimised out on platforms where unsigned char is 8 bits).

Note also that if you use a signed type, the result is implementation-defined.

It will behave as you want it to. It is safe to cast the value.

This particular code example is safe. But there are reasons to warn against lax use of the ~ operator.

The reason behind this is that ~ on small integer variables is a potential bug in more complex expressions, because of the implicit integer promotions in C. Imagine if you had an expression like

a = ~b >> 4;

It will not shift in zeroes as might have been expected.

If your static analyzer is set to include MISRA-C, you will for example get this warning for each ~ operator, because MISRA enforces the result of any operation on small integer types to be explicitly typecasted into the expected type, unsigned char in this case.

Lets take the case of Win32 machine.
Integer is 4 bytes and converting it to a char will result exactly as if left 3 bytes have been removed.

As you are converting a char to char, it doesn't matter to what is it being promoted to.
~b will add 3 bytes at the left change 0s to 1 and then remove... It does not affect your one right byte.

The same concept will be applicable for different architectures (be it 16 bit or 64 bit machine)

Assuming it to be little-endian

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow