سؤال

I am trying to understand the double precision numbers in Matlab. Why is this 1 - 3*(4/3 - 1) not equal to zero?

هل كانت مفيدة؟

المحلول

The real number 4/3 is not representable in double precision (or any other binary floating-point format) because it is not a dyadic rational. Thus, when you compute 4/3 in MATLAB, the value you get is rounded to the closest representable double-precision number, which is exactly:

1.3333333333333332593184650249895639717578887939453125

Subtracting 1 from this value is exact (it is a well-known theorem of FP error-analysis that subtracting numbers within a factor of two of each other is exact), so the result of 4/3 - 1 is:

0.3333333333333332593184650249895639717578887939453125

It happens that the result of multiplying this number by three is also exactly representable:

0.9999999999999997779553950749686919152736663818359375

Finally, subtracting from 1.0 is also exact (by the theorem I referenced earlier):

0.0000000000000002220446049250313080847263336181640625

So there is only a single source of rounding error in your computation, owing to the fact that 4/3 cannot be represented as a double, and the final result of the computation is simply that initial error carried forward.

نصائح أخرى

Running this on a scientific calculator - log2(2.2204e-16) - yields (almost) exactly -52. In other words, Matlab stores 52 bits of precision in a double, with another 5 bits for the exponent (4 + sign) and one for the sign of the significand. This is in line with the IEEE 754 implementation: 53 bits for the significand (52 + sign) and 5 for the exponent. All is well! As always, you should test whether two floating point numbers are close enough, not whether they are exactly equal. An appropriate example in Matlab would look like:

if abs(x - y) < 1e-15
      % Some code to execute when x and y are approximately equal
else
      % Some other code
end

The statement of adherence to IEEE 754 comes from the wikipedia article.

The trouble starts with the calculation of 4/3. The exact answer is representable in neither a finite number of decimal digits nor a finite number of bits. The result will be stored as 4/3+r where r is a small absolute value signed number representing the difference between the real value of 4/3 and the closest IEEE 754 64-bit binary float to 4/3, the rounding error.

Subtracting 1 results in 1/3+r. Multiplying by 3 gets 1+3r. Subtracting it from 1 gets -3r.

The end result is -3 times the rounding error in the original representation of 4/3.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top