Question

After one hour of trying to find a bug in my code I've finally found the reason. I was trying to add a very small float to 1f, but nothing was happening. While trying to figure out why I found that adding that small float to 0f worked perfectly.

Why is this happening? Does this have to do with 'orders of magnitude'? Is there any workaround to this problem?

Thanks in advance.

Edit:

Changing to double precision or decimal is not an option at the moment.

Was it helpful?

Solution

Because precision for a single-precision (32 bit) floating-point value is around 7 digits after the decimal point. Which means the value you are adding is essentially zero, at least when added to 1. The value itself, however, can effortlessly stored in a float since the exponent is small in that case. But to successfully add it to 1 you have to use the exponent of the larger number ... and then the digits after the zeroes disappear in rounding.

You can use double if you need more precision. Performance-wise this shouldn't make a difference on today's hardware and memory is often also not as constrained that you have to think about every single variable.

EDIT: As you stated that using double is not an option you could use Kahan summation, as akuhn pointed out in a comment.

Another option may be to perform intermediary calculations in double-precision and afterwards cast to float again. This will only help, however, when there are a few more operations than just adding a very small number to a larger one.

OTHER TIPS

This probably happens because the number of digits of precision in a float is constant, but the exponent can obviously vary.

This means that although you can add your small number to 0, you cannot expect to add it to a number that has an exponent different from 0, since there just won't be enough digits of precision left.

You should read What Every Computer Scientist Should Know About Floating-Point Arithmetic.

It looks like it has something to do with floating point precision. If I were you, I'd use a different type, like decimal. That should fix precision errors.

With float, you only get an accuracy of about seven digits. So your number'll be rounded into 1f. If you want to store such number, use double instead

http://msdn.microsoft.com/en-us/library/ayazw934.aspx

In addition to the accepted answer: If you need to sum up many small number and some larger ones, you should use Kahan Summation.

If performance is an issue (because you can't use double), then binary scaling/fixed-point may be an option. floats are stored as integers, but scaled by a large number (say, 2^16). Intermediate arithmetic is done with (relatively fast) integer operations. The final answer can be converted back to floating point at the end, by dividing by the scaling factor.

This is often done if the target processor lacks a hardware floating-point unit.

You're using the f suffix on your literals, which will make these floats instead of doubles. So your very small float will vanish in the bigger float.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top