Question

If I want to take the product of a list of floating point numbers, what's the worst-case/average-case precision lost by adding their logs and then taking exp of the sum as opposed to just multiplying them. Is there ever a case when this is actually more precise?

Was it helpful?

Solution

Absent any overflow or underflow shenanigans, if a and b are floating-point numbers, then the product a*b will be computed to within a relative error of 1/2 ulp.

A crude bound on the relative error after multiplying a chain of N doubles therefore results in answer off by a factor of at most (1 - epsilon/2)-N, which is about exp(epsilon N/2). I'd imagine you can expect a deviation of around epsilon sqrt(N) in the average case. (To first order, this is about N epsilon.)

Exponent overflow and underflow are more likely to happen with this strategy, though; you're more likely to get infinities, zeroes, and NaNs as well as imprecise values due to rounding of subnormals.

The other approach is more robust in that sense, but it is much slower and errs worse in the case where the straightforward approach doesn't result in an overflow or underflow. Here's a very, very crude analysis for standard doubles in the case where N is at least a couple orders of magnitude smaller than 253:

You can always take the log of a finite floating-point number and get a finite floating-point number, so we're cool there. You can add up N floating-point numbers either straightforwardly to get N epsilon worst-case "relative" error and sqrt(N) epsilon expected "relative" error, or using Kahan summation to get about 3 epsilon worst-case "relative" error. Scare quotes are around "relative" because the error is relative to the sum of the absolute values of the things you're summing.

Notice that no finite double has a logarithm whose absolute value is bigger than 710 or so. That means our sum-of-logarithms computed using Kahan summation has an absolute error of at most 2130 N epsilon. When we exponentiate our sum-of-logarithms, we get something off by a factor of at most exp(2130 N epsilon) from the right answer.

A pathological examples for the log-sum-exp approach:

int main() {
  double foo[] = {0x1.000000000018cp1023, 0x1.0000000000072p-1023};
  double prod = 1;
  double sumlogs = 0;
  for (int i = 0; i < sizeof(foo) / sizeof(*foo); i++) {
    prod *= foo[i];
    sumlogs += log(foo[i]);
  }
  printf("%a %a\n", foo[0], foo[1]);
  printf("%a %a %a\n", prod, exp(sumlogs), prod - exp(sumlogs));
}

On my platform, I get a difference of 0x1.fep-44. I'm sure there are worse examples.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top