Is order of arguments in an arithmetic expression important to achieve as most exact result as possible (speed is not necessary)?

https://softwareengineering.stackexchange.com/questions/297566

10-10-2020
|

Question

Actually, in this question I do not ask about particular language or architecture, however I understand there might be some differences.

In physics / engineering it is usually better to handle greater numbers than smaller ones, because of better absolute accuracy. For example when I need to calculate the area of a circle, having its diameter, I'd use the equation S = d * d / 4 * pi rather than S = pi * (d / 2) * (d / 2) (in both cases calculating from left to right).

How does it look in programming languages?

Is it important to provide "better" order of arguments? Are there any compilers that can optimize calculation for this?

Do such constructions make sense:

// finding a result of a*b/c
if (abs(c)<1){
    result = a / c * b;
} else {
    result = a * b / c;
}

(in the example one should also test values of a and b but let's assume they are large numbers)?

I know that using too large numbers I am risking an overflow. I know there is a difference between integer numbers (which are better in addition/subtraction) and float (which are better in multiplication / division).

There is also an issue with integer division, for example in Pascal there is common * multiplication operator and if both numbers are Integer, the result is Integer too, if one of them is Real (equivalent to float), the result is Real too. For division there are two operators: / which always results with Real number and div which takes only Integers and the result is Integer. So in this language it would be better first calculate multiplications and then divisions because Integer divisions may lead to losing the fractional part, and it is better to lose it later than earlier.

But for float numbers, which are stored with mantissa and exponent order of multiplications/divisions seems to be not necessary.

What I want to achieve is as most exact result as possible (speed is not necessary).

Solution

There are two cases to consider: multiplication (including division) and addition (including subtraction). Because floating point numbers are stored in exponential form (i.e as m*2^e), these operations are performed on the mantissa (m) and exponent (e) as separate values, not on the whole numbers involved.

Multiplication (basically) involves multiplying the mantissae (which are always between 1 and 2) and adding the exponents (which are integers). It follows that unless the integer addition of the exponents overflows, the absolute magnitude of the numbers makes no difference to the precision of the result.

For addition, however, things are different. To add floating point numbers, one of the mantissae is shifted by an appropriate number of bits to make the exponents equal, and then the mantissae are added. This shifting operation loses precision proportionally to the difference in exponents. Thus, addition is more precise when the operands are closer in magnitude. In practice this means that if you are adding many floating point numbers that may vary wildly in magnitude, it is best to start with the smallest (closest to zero) and progress out to add the larger magnitude.numbers at the end. If you don't know the relative magnitudes in advance, put them in an array and sort the array by magnitude before summing it. If you do, use an accumulator variable and add the smaller numbers first.

OTHER TIPS

In physics / engineering it is usually better to handle greater numbers than smaller ones, because of better absolute accuracy.

I don't think this is true at all. There is indeed a point when you add a small number to a big number. But regarding multiplication division there is a risk of overflow when you multiply numbers that are both greater than 1, so for example

99999999999999999999999/9999999999999999999x999999999999999999999999999

is better than

99999999999999999999999x999999999999999999999999999/9999999999999999999 because the later can (in some machines) result in an overflow.

On the other hand if you divide several times an expression by a big number you can get an underflow. So if you do have a good idea of the order of magnitude of the number involved it is a good idea to interleave multiplications and divisions to keep the intermediate results within the range of numbers the computer can represent.

In your area calculation S = pi * (d / 2) * (d / 2) is worse because there is one more operation than S = d * d / 4 * pi.

In fact I do not have much knowledge or idea whether using large numbers is better or not. But I can share two points that I know...

First is Operator Priority, that is something that you probably know. Some operators have higher priority than others (like * will be executed before + in 2 * 3 + 1 unless you use parenthesis).

For exact results; there are two points too. First is, float to int (or reverse) conversion is specific to programming language and different languages may act differently. Even the different versions of the same language may act differently.

Second point in exact results is the float type, because it is known that Floating Point Math may look Broken, because computers used to calculate using binary (base 2) so the number is represented as the power of two. It is not easy to represent 0.2 by the power of 2's and what you get is 0.001100110011001... or such. Using multiplication or division many times on floating point values causes this difference to get bigger and bigger. David Goldberg's What Every Computer Scientist Should Know About Floating-Point Arithmetic have great information on that topic.

Python have Decimal which can calculate and store exact value of a decimal represented number. Most programming languages have similar data types like that.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange