Question

Will an integer value converted to a floating-point value and back again be the same as the original integer value?

For example:

unsigned x = 42;
double y = x;
unsigned z = y;

Assuming the compiler doesn't optimize out the floating-point conversion, will x == z always evaluate as true?

I suspect that any representation error in the floating-point conversion will always be an increase in value. Therefore, when the floating-point value is converted back to an integer value, the value is truncated which always results in the original integer value.

Is my assumption correct?

Was it helpful?

Solution

Assuming IEEE 754 double-precision format for double, the expression x == z will evaluate to 1 for all values of x up to 253. If your compiler offers 32-bit unsigned int, for instance, this means for all possible values of x.

You have edited your question to ask about the conversion from integer to float. In most C implementations, this conversion rounds according to the FPU rounding mode, which is by default round-to-nearest-even. There is an asymmetry with the conversion from float to integer there (as you point out, the conversion from float to int always truncates).

However, any error in the conversion from integer to float would not mean that you get a fractional part where there was none, but that you get the wrong integer altogether. For instance the integer 253+1 is converted to the double that represents 253. For this reason it would not help that the conversion from float to integer truncates even if the conversion from float to integer always rounded up.

The rounding error in the conversion from integer to float can be larger than one: the integer 5555555555555555555, when converted to double, is rounded to 5555555555555555328, which happens to be have a simpler representation in binary than the former. Half the times, the rounding goes upward: for instance 5555555555555555855 is rounded to 5555555555555556352.

OTHER TIPS

Any integer up to 253 has an exact representation as a double-precision floating-point number if double follows IEEE-754 (as your tag suggests). So, assuming int is 32-bit, yes, you can convert an unsigned to double without loss of precision.

Let's assume that your floating point double precision type is a 64 bit IEEE754 type. (The C standard does not insist on this but it's what you have tagged).

It depends on the size of your unsigned int. If it's 32 bit then yes, if 64 bit then not necessarily. (The cutoff is on the 53rd bit: 253 + 1 is the smallest positive number cannot be represented precisely in an IEEE floating point double.).

On 32 bit platforms, the answer is always yes.

On 64 bit platforms it depends on the compiler. In LP64 and LLP64, unsigned int is 32 bit but in ILP64 it is 64 bit. (Note that Win64 uses LLP64 which also sets long at 32 bit as well).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top