Question

I know in C and Java, float's underlying representation is IEEE754-32, double is IEEE754-64.

In expressions, float will be auto-promoted to double. So how? Take 3.7f for example. Is the process like this?

  1. 3.7f will be represented in memory using IEEE754. It fits in 4 bytes.
  2. During calculation, it may be loaded into a 64-bit register (or whatever 64-bit place), turning the 3.7f into IEEE754-64 represent.
Was it helpful?

Solution

It is very implementation-dependent.

For one example, on x86 platform the set of FPU commands includes commands for loading/storing data in IEEE754 float and double formats (as well as many other formats). The data is loaded into the internal FPU registers that have 80-bit width. So in reality on x86 all floating-point calculations are performed with 80-bit floating-point precision. i.e. all floating-point data is actually promoted to 80-bit precision. How is data represented inside those registers is completely irrelevant, since you cannot observe them directly anyway.

This means that on x86 platform there's no such thing as a single-step float-to-double conversion. Whenever a need for such conversion arises, it is actually implemented as two-step conversion: float-to-internal-fpu and internal-fpu-to-double.

This BTW created a significant semantic difference between x86 FPU computation model and C/C++ computation models. In order to fully match the language model the processor has to forcefully reduce precision of intermediate floating-point results, thus negatively affecting performance. Many compilers provide user with options that control FPU computation model, allowing the user to opt for strict C/C++ conformance, better performance or something in between.

Not so many years ago FPU unit was an optional component of x86 platform. Floating-point computations on FPU-less platforms were performed in software, either by emulating FPU or by generating code without any FPU instructions at all. In such implementations things could work differently, like, for example, perform software conversion from IEEE754 float to IEEE754 double directly.

OTHER TIPS

I know in C/Java, float point number's underlying represent is IEEE754-32, double point's is IEEE754-64.

Wrong. The C standard has never specified a fixed, specific limit in integer and floating-point type sizes although they did ensure the relation between types

1 == sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long)
sizeof(float) <= sizeof(double) <= sizeof(long double)

C implementations are allowed to use any type of floating-point format although most now use IEEE-754 and its descendants. Likewise they can freely use any of integer representations such as 1's complement or sign-magnitude

About the promotion rules, pre-standard versions of C promote floats in expressions to double but in C89/90 the rule was changed and float * float results in a float result.

If either operand has type long double, the other operand is converted to long double
Otherwise, if either operand is double, the other operand is converted to double.
Otherwise, if either operand is float, the other operand is converted to float.

Implicit type conversion rules in C++ operators

It would be true in Java or C# though, since they run bytecode in a virtual machine, and the VM's types are consistent across platforms

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top