Adding unsigned integers in C

https://stackoverflow.com/questions/7328334

27-10-2019
|

문제

Here are two very simple programs. I would expect to get the same output, but I don't. I can't figure out why. The first outputs 251. The second outputs -5. I can understand why the 251. However, I don't see why the second program gives me a -5.

PROGRAM 1:

#include <stdio.h>

int main()
{

unsigned char  a;
unsigned char  b;
unsigned int  c;

a = 0;
b= -5;

c =  (a + b);

printf("c hex: %x\n", c);
printf("c dec: %d\n",c);

}

Output:

c hex: fb
c dec: 251

PROGRAM 2:

#include <stdio.h>

int main()
{

unsigned char  a;
unsigned char  b;
unsigned int  c;

a = 0;
b=  5;

c =  (a - b);

printf("c hex: %x\n", c);
printf("c dec: %d\n",c);

}

Output:

c hex: fffffffb
c dec: -5

해결책

There are two separate issues here. The first is the fact that you are getting different hex values for what looks like the same operations. The underlying fact that you are missing is that chars are promoted to ints (as are shorts) to do arithmetic. Here is the difference:

a = 0  //0x00
b = -5 //0xfb
c = (int)a + (int)b

Here, a is extended to 0x00000000 and b is extended to 0x000000fb (not sign extended, because it is an unsigned char). Then, the addition is performed, and we get 0x000000fb.

a = 0  //0x00
b = 5  //0x05
c = (int)a - (int)b

Here, a is extended to 0x00000000 and b is extended to 0x00000005. Then, the subtraction is performed, and we get 0xfffffffb.

The solution? Stick with chars or ints; mixing them can cause things you won't expect.

The second problem is that an unsigned int is being printed as -5, clearly a signed value. However, in the string, you told printf to print its second argument, interpreted as a signed int (that's what "%d" means). The trick here is that printf doesn't know what the types of the variables you passed in. It merely interprets them in the way the string tells it to. Here's an example where we tell printf to print a pointer as an int:

int main()
{
    int a = 0;
    int *p = &a;
    printf("%d\n", p);
}

When I run this program, I get a different value each time, which is the memory location of a, converted to base 10. You may note that this kind of thing causes a warning. You should read all of the warnings your compiler gives you, and only ignore them if you're completely sure you are doing what you intend to.

다른 팁

In the first program, b=-5; assigns 251 to b. (Conversions to an unsigned type always reduce the value modulo one plus the max value of the destination type.)

In the second program, b=5; simply assigns 5 to b, then c = (a - b); performs the subtraction 0-5 as type int due to the default promotions - put simply, "smaller than int" types are always promoted to int before being used as operands of arithmetic and bitwise operators.

Edit: One thing I missed: Since c has type unsigned int, the result -5 in the second program will be converted to unsigned int when the assignment to c is performed, resulting in UINT_MAX-4. This is what you see with the %x specifier to printf. When printing c with %d, you get undefined behavior, because %d expects a (signed) int argument and you passed an unsigned int argument with a value that's not representable in plain (signed) int.

You're using the format specifier %d. That treats the argument as a signed decimal number (basically int).

You get 251 from the first program because (unsigned char)-5 is 251 then you print it like a signed decimal digit. It gets promoted to 4 bytes instead of 1, and those bits are 0, so the number looks like 0000...251 (where the 251 is binary, I just didn't convert it).

You get -5 from the second program because (unsigned int)-5 is some large value, but casted to an int, it's -5. It gets treated like an int because of the way you use printf.

Use the format specifier %ud to print unsigned decimal values.

What you're seeing is the result of ~~how the underlying machine is representing the numbers~~ how the C standard defines signed to unsigned type conversions (for the arithmetic) and how the underlying machine is representing numbers (for the result of the undefined behavior at the end).

When I originally wrote my response I had assumed that the C standard didn't explicitly define how signed values should be converted to unsigned values, since the standard doesn't define how signed values should be represented or how to convert unsigned values to signed values when the range is outside that of the signed type.

However, it turns out that the standard does explicitly define that when converting from negative signed to positive unsigned values. In the case of an integer, a negative signed value x will be converted to UINT_MAX+1-x, just as if it were stored as a signed value in two's complement and then interpreted as an unsigned value.

So when you say:

unsigned char  a;
unsigned char  b;
unsigned int c;

a = 0; 
b = -5;
c = a + b;

b's value becomes 251, because -5 is converted to an unsigned type of value UCHAR_MAX-5+1 (255-5+1) using the C standard. It's then after that conversion that the addition takes place. That makes a+b the same as 0 + 251, which is then stored in c. However, when you say:

unsigned char  a;
unsigned char  b;
unsigned int c;

a = 0;
b = 5;
c = (a-b);

printf("c dec: %d\n", c);

In this case, a and b are promoted to unsigned ints, to match with c, so they remain 0 and 5 in value. However 0 - 5 in unsigned integer math leads to an underflow error, which is defined to result in UINT_MAX+1-5. If this had happened before the promotion, the value would be UCHAR_MAX+1-5 (i.e. 251 again).

However, the reason you see -5 printed in your output is a combination of the fact that the unsigned integer UINT_MAX-4 and -5 have the same exact binary representation, just like -5 and 251 do with a single-byte datatype, and the fact that when you used "%d" as the formatting string, that told printf to interpret the value of c as a signed integer instead of an unsigned integer.

Since a conversion from unsigned values to signed values for invalid values isn't defined, the result becomes implementation specific. In your case, since the underlying machine uses two's complement for signed values, the result is that the unsigned value UINT_MAX-4 becomes the signed value -5.

The only reason this doesn't happen in the first program because an unsigned int and a signed int can both represent 251, so converting between the two is well defined and using "%d" or "%u" doesn't matter. In the second program, however, it results in undefined behavior and becomes implementation specific since your value of UINT_MAX-4 went outside the range of an signed int.

What's happening under the hood

It's always good to double check what you think is happening or what should happen with what's actually happening, so let's look at the assembly language output from the compiler now to see exactly what's going on. Here's the meaningful part of the first program:

    mov     BYTE PTR [rbp-1], 0   ; a becomes 0
    mov     BYTE PTR [rbp-2], -5  ; b becomes -5, which as an unsigned char is also 251
    movzx   edx, BYTE PTR [rbp-1] ; promote a by zero-extending to an unsigned int, which is now 0
    movzx   eax, BYTE PTR [rbp-2] ; promote b by zero-extending to an unsigned int which is now 251
    add     eax, edx  ; add a and b, that is, 0 and 251

Notice that although we store a signed value of -5 in the byte b, when the compiler promotes it, it promotes it by zero-extending the number, meaning it's being interpreted as the unsigned value that 11111011 represents instead of the signed value. Then the promoted values are added together to become c. This is also why the C standard defines signed to unsigned conversions the way it does -- it's easy to implement the conversions on architectures that use two's complement for signed values.

Now with program 2:

    mov     BYTE PTR [rbp-1], 0 ; a = 0
    mov     BYTE PTR [rbp-2], 5 ; b = 5
    movzx   edx, BYTE PTR [rbp-1] ; a is promoted to 32-bit integer with value 0
    movzx   eax, BYTE PTR [rbp-2] ; b is promoted to a 32-bit integer with value 5
    mov     ecx, edx 
    sub     ecx, eax ; a - b is now done as 32-bit integers resulting in -5, which is '4294967291' when interpreted as unsigned

We see that a and b are once again promoted before any arithmetic, so we end up subtracting two unsigned ints, which leads to a UINT_MAX-4 due to underflow, which is also -5 as a signed value. So whether you interpret it as a signed or unsigned subtraction, due to the machine using two's complement form, the result matches the C standard without any extra conversions.

Assigning a negative number to an unsigned variable is basically breaking the rules. What you're doing is converting the negative number to a large positive number. You're not even guaranteed, technically, that the conversion is the same from one processor to another -- on a 1's complement system (if any still existed) you'd get a different value, eg.

So you get what you get. You can't expect signed algebra to still apply.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow