Converting float to double

https://stackoverflow.com/questions/1421684

07-07-2019
|

Question

How expensive is the conversion of a float to a double? Is it as trivial as an int to long conversion?

EDIT: I'm assuming a platform where where float is 4 bytes and double is 8 bytes

Solution

Platform considerations

This depends on platform used for float computation. With x87 FPU the conversion is free, as the register content is the same - the only price you may sometimes pay is the memory traffic, but in many cases there is even no traffic, as you can simply use the value without any conversion. x87 is actually a strange beast in this respect - it is hard to properly distinguish between floats and doubles on it, as the instructions and registers used are the same, what is different are load/store instructions and computation precision itself is controlled using status bits. Using mixed float/double computations may result in unexpected results (and there are compiler command line options to control exact behaviour and optimization strategies because of this).

When you use SSE (and sometimes Visual Studio uses SSE by default), it may be different, as you may need to transfer the value in the FPU registers or do something explicit to perform the conversion.

Memory savings performance

As a summary, and answering to your comment elsewhere: if you want to store results of floating computations into 32b storage, the result will be same speed or faster, because:

If you do this on x87, the conversion is free - the only difference will be fstp dword[] will be used instead of fstp qword[].
If you do this with SSE enabled, you may even see some performance gain, as some float computations can be done with SSE once the precision of the computation is only float insteead of default double.
In all cases the memory traffic is lower

OTHER TIPS

Float to double conversions happen for free on some platforms (PPC, x86 if your compiler/runtime uses the "to hell with what type you told me to use, i'm going to evaluate everything in long double anyway, nyah nyah" evaluation mode).

On an x86 environment where floating-point evaluation is actually done in the specified type using SSE registers, conversions between float and double are about as expensive as a floating-point add or multiply (i.e., unlikely to be a performance consideration unless you're doing a lot of them).

In an embedded environment that lacks hardware floating-point, they can be somewhat costly.

This is specific to the C++ implementation you are using. In C++, the default floating-point type is double. A compiler should issue a warning for the following code:

float a = 3.45;

because the double value 3.45 is being assigned to a float. If you need to use float specifically, suffix the value with f:

float a = 3.45f;

The point is, all floating-point numbers are by default double. It's safe to stick to this default if you are not sure of the implementation details of your compiler and don't have significant understanding of floating point computation. Avoid the cast.

Also see section 4.5 of The C++ Programming Language.

I can't imagine it'd be too much more complex. The big difference between converting int to long and converting float to double is that the int types have two components (sign and value) while floating point numbers have three components (sign, mantissa, and exponent).

IEEE 754 single precision is encoded in 32 bits using 1 bit for the sign, 8 bits for the exponent, and 23 bits for the significand. However, it uses a hidden bit, so the significand is 24 bits (p = 24), even though it is encoded using only 23 bits.

-- David Goldberg, What Every Computer Scientist Should Know About Floating-Point Arithmetic

So, converting between float and double is going to keep the same sign bit, set the last 23/24 bits of the float's mantissa to the double's mantissa, and set the last 8 bits of the float's exponent to the double's exponent.

This behavior may even be guaranteed by IEEE 754... I haven't checked it, so I'm not sure.

probably a bit slower than converting int to long, as memory required is larger and manipulation is more complex. A good reference about memory alignment issues

Maybe this help:

#include <stdlib.h>
#include <stdio.h>
#include <conio.h>

double _ftod(float fValue)
{
  char czDummy[30];
  printf(czDummy,"%9.5f",fValue);
  double dValue = strtod(czDummy,NULL);
  return dValue;
}


int main(int argc, char* argv[])
{
  float fValue(250.84f);
  double dValue = _ftod(fValue);//good conversion
  double dValue2 = fValue;//wrong conversion
  printf("%f\n",dValue);//250.840000
  printf("%f\n",dValue2);//250.839996
  getch();
  return 0;
}

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow