When does the signedness of an integer really matter?

https://stackoverflow.com/questions/8040005

c
signedness

23-02-2021
|

Question

Due to the way conversions and operations are defined in C, it seems to rarely matter whether you use a signed or an unsigned variable:

uint8_t u; int8_t i;

u = -3;    i = -3;
u *= 2;    i *= 2;
u += 15;   i += 15;
u >>= 2;   i >>= 2;

printf("%u",u); // -> 2
printf("%u",i); // -> 2

So, is there a set of rules to tell under which conditions the signedness of a variable really makes a difference?

Solution

It matters in these contexts:

division and modulo: -2/2 = 1, -2u/2 = UINT_MAX/2-1, -3%4 = -3, -3u%4 = 1
shifts. For negative signed values, the result of >> and << are implementation defined or undefined, resp. For unsigned values, they are always defined.
relationals -2 < 0, -2u > 0
overflows. x+1 > x may be assumed by the compiler to be always true iff x has signed type.

OTHER TIPS

Yes. Signedness will affect the result of Greater Than and Less Than operators in C. Consider the following code:

unsigned int a = -5;
unsigned int b = 7;

if (a < b)
    printf("Less");
else
    printf("More");

In this example, "More" is incorrectly output, because the -5 is converted to a very high positive number by the compiler.

This will also affect your arithmetic with different sized variables. Again, consider this example:

unsigned char a = -5;
signed short b = 12;

printf("%d", a+b);

The returned result is 263, not the expected 7. This is because -5 is actually treated as 251 by the compiler. Overflow makes your operations work correctly for same-sized variables, but when expanding, the compiler does not expand the sign bit for unsigned variables, so it treats them as their original positive representation in the larger sized space. Study how two's compliment works and you'll see where this result comes from.

It affects the range of values that you can store in the variable.

It is relevant mainly in comparison.

printf("%d", (u-3) < 0); // -> 0
printf("%d", (i-3) < 0); // -> 1

Overflow on unsigned integers just wraps around. On signed values this is undefined behavior, everything can happen.

The signedness of 2's complement numbers is simply just a matter of how you are interpreting the number. Imagine the 3 bit numbers:

If you think of 000 as zero and the numbers as they are natural to humans, you would interpret them like this:

This is called "unsigned integer". You see everything as a number bigger than/equal to zero.

Now, what if you want to have some numbers as negative? Well, 2's complement comes to rescue. 2's complement is known to most people as just a formula, but in truth it's just congruency modulo 2^n where n is the number of bits in your number.

Let me give you a few examples of congruency:

2 = 5 = 8 = -1 = -4 module 3
-2 = 6 = 14 module 8

Now, just for convenience, let's say you decide to have the left most bit of a number as its sign. So you want to have:

000: 0
001: positive
010: positive
011: positive
100: negative
101: negative
110: negative
111: negative

Viewing your numbers congruent modulo 2^3 (= 8), you know that:

4 = -4
5 = -3
6 = -2
7 = -1

Therefore, you view your numbers as:

000: 0
001: 1
010: 2
011: 3
100: -4
101: -3
110: -2
111: -1

As you can see, the actual bits for -3 and 5 (for example) are the same (if the number has 3 bits). Therefore, writing x = -3 or x = 5 gives you the same result.

Interpreting numbers congruent modulo 2^n has other benefits. If you sum 2 numbers, one negative and one positive, it could happen on paper that you have a carry that would be thrown away, yet the result is still correct. Why? That carry was a 2^n which is congruent to 0 modulo 2^n! Isn't that convenient?

Overflow is also another case of congruency. In our example, if you sum two unsigned numbers 5 and 6, you get 3, which is actually 11.

So, why do you use signed and unsigned? For the CPU there is actually very little difference. For you however:

If the number has n bits, the unsigned represents numbers from 0 to 2^n-1
If the number has n bits, the signed represents numbers from -2^(n-1) to 2^(n-1)-1

So, for example if you assign -1 to a an unsigned number, it's the same as assigning 2^n-1 to it.

As per your example, that's exactly what you are doing. you are assigning -3 to a uint8_t, which is illegal, but as far as the CPU is concerned you are assigning 253 to it. Then all the rest of the operations are the same for both types and you end up getting the same result.

There is however a point that your example misses. operator >> on signed number extends the sign when shifting. Since the result of both of your operations is 9 before shifting you don't notice this. If you didn't have the +15, you would have -6 in i and 250 in u which then >> 2 would result in -2 in i (if printed with %u, 254) and 62 in u. (See Peter Cordes' comment below for a few technicalities)

To understand this better, take this example:

  (signed)101011 (-21) >> 3 ----> 111101 (-3)
(unsigned)101011 ( 43) >> 3 ----> 000101 ( 5)

If you notice, floor(-21/8) is actually -3 and floor(43/8) is 5. However, -3 and 5 are not equal (and are not congruent modulo 64 (64 because there are 6 bits))

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow