Question

Float multiplications leading to results that are limited by FLT_MIN seam to be very slow compared to other float multiplications. Running the sample code below on my Linux machine, I got the following results:

Elapsed time for 1E09 iterations of  0 * 0.900000 : 2.623269 s 
Elapsed time for 1E09 iterations of  1.17549e-38 * 0.900000 : 73.851011 s 
Elapsed time for 1E09 iterations of  2.35099e-38 * 0.900000 : 2.637788 s 
Elapsed time for 1E09 iterations of  0.00870937 * 0.900000 : 2.632788 s 
Elapsed time for 1E09 iterations of  1 * 0.900000 :  2.654571 s 
Elapsed time for 1E09 iterations of  3.40282e+38 * 0.900000 : 2.639316 s 

The operation 1.17549e-38 * 0.9 seem to take at least 25 times longer than the other tested multiplication operations. Is this a well known issue?

In a time critical project in which a large number of such multiplications potentially leading to FLT_MIN need to be performed, what could be a fast way to get around this problem? (I can't afford to check every value before multiplying it but I could tolerate an error of the order of e-5 in the multiplication result)

#include <sys/time.h>
#include <stdio.h>
#include <float.h>
#define N_VALS 6
#define ALMOST_MIN FLT_MIN*2
int timeval_subtract (struct timeval *result,struct timeval * start,struct timeval *stop)
{
  long int sdiff= stop-> tv_sec - start->tv_sec;
  long int udiff=stop->tv_usec - start-> tv_usec;
  if (udiff<0)
  {
    udiff=1000000+udiff;
    sdiff--;
  }
  result->tv_sec = sdiff;
  result->tv_usec = udiff;  
}

int main()
{
  float values [N_VALS]={0.0f,FLT_MIN,ALMOST_MIN, 0.00870937f, 1.0f, FLT_MAX};
  float out, mul=0.9f;
  int i, j, err;
  struct timeval t_start, t_stop, t_elaps;
  for (j=0; j<N_VALS; j++)
  {
    err=gettimeofday(&t_start, NULL);
    for (i=0; i<1000000000; i++)
      out=values[j]*mul;

    err=gettimeofday(&t_stop, NULL);
    timeval_subtract(&t_elaps, &t_start, &t_stop);
    printf("Elapsed time for 1E09 iterations of  %g * %f : %ld.%06ld s \n", values[j], mul, t_elaps.tv_sec, t_elaps.tv_usec);
  }
}
Was it helpful?

Solution

The reason it takes much longer to do .9 * FLT_MIN is that the result is smaller than the smallest value a float can represent. This causes the processor to raise an exception, which is handled by the OS and may involve calling functions in user-space. That takes a long time, compared to a simple floating point multiply which is done entirely in hardware.

How to fix it? Depends on your platform and build tools. If you are using gcc, then it tries to use CPU settings to optimize some operations, depending on what flags you set. Look at the gcc manual for -ffast-math and related floating point optimization flags. Note that the use of these flags can cause results that do not comply exactly with the IEEE floating point spec.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top