Question

In C++ programming, when do I need to worry about the precision issue? To take a small example (it might not be a perfect one though),

std::vector<double> first (50000, 0.0);
std::vector<double> second (first);

Could it be possible that second[619] = 0.00000000000000000000000000001234 (I mean a very small value). Or SUM = second[0]+second[1]+...+second[49999] => 1e-31? Or SUM = second[0]-second[1]-...-second[49999] => -7.987654321e-12?

My questions:

  1. Could it be some small disturbances in working with the double type numbers?
  2. What may cause these kind of small disturbances? i.e. rounding errors become large? Could you please list them? How to take precautions?
  3. If there could be small disturbance in certain operations, does it then mean after these operations, using if (SUM == 0) is dangerous? One should then always use if (SUM < SMALL) instead, where SMALL is defined as a very small value, such as 1E-30?
  4. Lastly, could the small disturbances result into a negative value? Because if it is possible, then I should be better use if (abs(SUM) < SMALL) instead.

Any experiences?

Was it helpful?

Solution

This is a good reference document for floating point precision: What Every Computer Scientist Should Know About Floating-Point Arithmetic

One of the more important parts is catastrophic cancellation

Catastrophic cancellation occurs when the operands are subject to rounding errors. For example in the quadratic formula, the expression b2 - 4ac occurs. The quantities b2 and 4ac are subject to rounding errors since they are the results of floating-point multiplications. Suppose that they are rounded to the nearest floating-point number, and so are accurate to within .5 ulp. When they are subtracted, cancellation can cause many of the accurate digits to disappear, leaving behind mainly digits contaminated by rounding error. Hence the difference might have an error of many ulps. For example, consider b = 3.34, a = 1.22, and c = 2.28. The exact value of b2 - 4ac is .0292. But b2 rounds to 11.2 and 4ac rounds to 11.1, hence the final answer is .1 which is an error by 70 ulps, even though 11.2 - 11.1 is exactly equal to .16. The subtraction did not introduce any error, but rather exposed the error introduced in the earlier multiplications.

Benign cancellation occurs when subtracting exactly known quantities. If x and y have no rounding error, then by Theorem 2 if the subtraction is done with a guard digit, the difference x-y has a very small relative error (less than 2).

A formula that exhibits catastrophic cancellation can sometimes be rearranged to eliminate the problem. Again consider the quadratic formula

OTHER TIPS

For your specific example, 0 has an exact representation as a double, and adding exactly 0 to a double does not change its value.

Also, like any other values you put in variables, numbers that you initialize in the array are not going to mysteriously change. You only get rounding when the result of a calculation cannot be exactly represented as a floating point number.

To give a better opinion about "disturbances" I would need to know the kinds of calculations that your code performs.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top