Question

Is it possible for an explicit cast of, say, int32_t to uint32_t, to alter the bit representation of the value?

For example, given that I have the following union:

typedef union {
    int32_t signed_val;
    uint32_t unsigned_val;
} signed_unsigned_t;

Are these code segments guaranteed by the spec to have the same behaviour?

uint32_t reinterpret_signed_as_unsigned(int32_t input) {
    return (uint32_t) input;
}

and

uint32_t reinterpret_signed_as_unsigned(int32_t input) {
    signed_unsigned_t converter;
    converter.signed_val = input;
    return converter.unsigned_val;
}

I'm considering C99 here. I've seen a few similar questions, but they all seemed to be discussing C++, not C.

Was it helpful?

Solution

Casting a signed integer type to an unsigned integer type of the same width can change the representation, if you can find a machine with sign-magnitude or ones-complement signed representations. But the types int32_t and uint32_t are guaranteed to be two's-complement representations, so in that particular case the representation cannot change.

Conversion of signed integer to unsigned integers is well-defined in the standard, section 6.3.1.3. The relevant algorithm is the second paragraph:

  1. When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.
  2. Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
  3. ...

So the result has to be, in effect, what a bit-for-bit copy would have resulted in had the negative number been stored in 2's-complement. A conforming implementation is allowed to use sign-magnitude or ones-complement; in both cases, the representation of negative integers will have to be modified to cast to unsigned.


Summarizing a lengthy and interesting discussion in the comments:

  • In the precise example in the OP, which uses int32_t and uint32_t, the representations must be equal if the program compiles, because C99 requires int32_t and uint32_t to be exactly 32 bits long with no padding, and requires int32_t to use 2's-complement representation. It does not, however, require those types to exist; a ones-complement implementation could simply not define int32_t, and still conform.

  • My interpretation of type-punning is below the horizontal rule. @R.. pointed us to a Defect Report from 2004 which seems to say that type-punning is either OK or fires a trap, which is closer to implementation-defined behaviour than undefined behaviour. On the other hand, the suggested resolution of that DR doesn't seem to be in the C11 document, which says (6.2.6.1(5)):

Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined.

That seems to me to be saying that type-punning is undefined behaviour if one of the participating types has a trap representation (and consequently is not undefined behaviour if the reading type does not have a trap representation). On the other hand, no type is required to have a trap representation, and only a few types are prohibited from having one: char and union types -- but not members of union types --, as well as whichever of the [u]int*K_t types are implemented.

My previous statement on type-punning follows:


The storage-punning union has undefined behaviour. But without invoking lagartos voladores, it is somewhat expected that sign-magnitude or ones-complement machines may throw a hardware exception if a certain value is stored as unsigned and then accessed as signed.

Both ones-complement and sign-magnitude have two possible representations of 0, one with each popular sign bit. The one with a negative sign bit, "negative zero", is allowed to be a "trap value"; consequently, accessing the value (even just to copy it) as a signed integer could trigger the trap.

Although the C compiler would be within its rights to suppress the trap, say by copying the value with memcpy or an unsigned opcode, it is unlikely to do so because that would be surprising to a programmer who knew that her program was running on a machine with trapping negative zeros, and was expecting the trap to trigger in the case of an illegal value.

OTHER TIPS

In the particular case you mention, a conversion from int32_t to uint32_t, the bit representation will be the same.

The standard specifically requires intN_t to be "a signed integer type with width N , no padding bits, and a two’s complement representation". Furthermore, corresponding signed and unsigned types must have the same representation for values within their shared range:

A valid (non-trap) object representation of a signed integer type where the sign bit is zero is a valid object representation of the corresponding unsigned type, and shall represent the same value.

There is one very small possible loophole: in principle, an implementation could, for example, make int32_t a typedef for int, and uint32_t a typedef for unsigned long, where intandlong are both 32 bits but have different byte orders. But that would only happen in a deliberately perverse implementation. Correction: This is not possible for a conforming implementation. int32_t and uint32_t must denote corresponding signed and unsigned types.

The above applies only because you happened to choose int32_t and uint32_t for your example, and the standard places very specific restrictions on their representation. (And if an implementation can't meet those restrictions, then it simply won't define int32_t or uint32_t.)

More generally, though, signed types are permitted to have one of three representations:

  • sign and magnitude, where setting the sign bit to 1 negates a number;
  • two's complement, where negation is equivalent to a bitwise complement followed by adding 1; and
  • one's complement, where negation is equivalent to a bitwise complement.

The vast majority of modern systems use two's complement (and have no padding bits). On such systems, signed-to-unsigned conversion with types of the same size generally does not change the bit representation. (The semantics of type conversions are defined in terms of values, but are designed to be convenient for two's complement systems.)

But for a system that uses either sign and magnitude or one's complement, signed-to-unsigned conversion must preserve the value, which means that conversion of a negative value must change the representation.

If the value is in the range of both the signed and the unsigned types, then both the value and representation are unchanged by conversions.

Otherwise, the signed-to-unsigned conversion is only allowed to preserve the bit representation when the implementation's representation of negative values for the type is twos-complement. For ones complement or sign-magnitude, it conversion must change the representation. The conversion in the other direction is implementation-defined, so it may or may not change the representation.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top