Truncating Floating point creating additional errors? (AS3)

https://stackoverflow.com/questions/11366446

19-06-2021
|

Question

I am trying to build a deterministic simulation, where the floating point numbers are truncated off via the following function: (I found it here: http://joshblog.net/2007/01/30/flash-floating-point-number-errors )

return Math.round(10000 * float) / 10000;

My question is: wouldn't the fact that I am dividing it by 10000, be in itself a source of floatpoint errors? IE every time a dividision takes place, it will result in a new float with further possible nondeterministic results.

EDIT: how about this? using only powers of 2

return Math.round(float* 1024) / 1024;

Solution

My goal was to achieve greater consistency across platforms(C# / AS3 and 32/64 bits), while I accept that 100% consistency is not possible. (due to AS3 not capable of real integer math, as all internal operations are performed via floats)

What I have gathered so far( Thanks to Eric Postpischil and Jeffrey Sax):

Math.round(1024 * float) / 1024;

Out of the above, the "Math.round(1024 * float)" operation may NOT produce identical results on all platforms, if the "errors have accumulated to more than half of the quantum" which is possible even "within a single operation".

While this is mathematically possible, it is likely to be very rare, so overall this operation will still eliminate more inconsistencies than it would generate, so it is worthwhile to perform it as it will reduce inconsistency across platforms (though cannot eliminate them)

Where as for the "/ 1024" part, as 1024 is a power of 2, that is a straight bit shift, it will NOT introduce extra errors, where as if I divided by 1000 that would introduce a small chance of an extra error, as 1000 cannot be perfectly represented. So a division with 1000 could introduce another error after the rounding which the division by 1024 could not.

CONCLUSION: Math.round(1024 * float) / 1024; is better than Math.round(1000 * float) / 1000; although neither of them is perfect.

Is this an accurate statement?

OTHER TIPS

When you say deterministic, I'm assuming you want a reproducible simulation, where you get the exact same results every time you run the simulation.

To make this happen, you need to find the source of possible variation and eliminate it.

The only way is to compile to a binary for a specific architecture.

Floating-point arithmetic itself is fully specified. The floating-point standards (IEEE-754) are followed by all modern processors and leave no ambiguity.

There are two main variations:

Differences in instruction sets. This is the most obvious one. If you compile your application to 32 or 64 bit, you will likely get slightly different results. 32 bit applications tend to use older style x87 instructions which use 80 bit intermediate values. This causes some results to be rounded differently. Even on x86 there are differences, if you use SSE instructions, which work on multiple operands at once. Some compilers may generate code that depends on how operands are aligned in memory.
Differences in instruction ordering. Mathematically, (a+b)+c and a+(b+c) are equivalent (addition is associative). In floating-point calculations this is not the case. If a is one, b is minus one, and c a tiny number so that 1+c gets rounded to 1, then the expressions evaluate to c and 0, respectively. It is the compiler that decides which instructions to use. Depending on your language and platform, it may be the language compiler or the Just-in-Time IL/bytecode compiler. Either way, the compiler is a black box, and it may change the way it compiles code without our knowledge. The smallest difference can lead to a different end result.

The rounding approach looks nice in theory, but it doesn't work. No matter how you round, there are always cases where two different but equivalent sets of instructions produce a result that gets rounded differently.

The core reason is that rounding is not composable, in the sense that rounding to a digits, and then rounding to b (< a) digits is not equivalent to rounding to b digits from the beginning. For example: 1.49 rounded to one digit is 1.5 and rounding that to zero digits gives 2. But rounding to zero digits directly yields 1.

So, on an x87-based system which uses 80-bit 'extended' precision for intermediate values, you start with 64 significant bits. You can round this down directly to your desired precision. If you have double-precision intermediates, you get the same intermediate result, but rounded to 53 significant bits, which is then rounded to your desired precision.

Your only option is to produce machine code for a specific architecture.

Now, if your goal is only to minimize the differences instead of completely eliminating them, then the answer is straightforward: dividing or multiplying by a power of two (like 1024) does not introduce any additional round-off error in the range used by your application, while multiplying and dividing by a number like 1000 does.

If you look at accumulating errors as a random walk, then using 1000 for rounding requires more steps than using 1024. Both the multiplication and the division may introduce additional errors. So on average, the total error will be larger, and so you have a bigger chance that the rounding operation goes the wrong way. This is even true when you round on every operation.

Dividing by 10,000 results in a rounding error equal to the difference between the exact mathematical result and the nearest number representable in double-precision, assuming IEEE 754 binary floating-point arithmetic in round-to-nearest mode. This error is at most 1/2 ULP (unit of least precision) of the result.

Multiplying by a power of two, rounding to an integer, and dividing by the same power of two will not cause have any errors in the rounding operations except: Multiplication with an exact result around 2¹⁰²⁴ (the exact threshold is slightly slower) or greater will produce a floating-point infinity. (In general, multiplication or division by powers of two can produce rounding errors when the result underflows the floating-point range, that is, when the exact mathematical result is in (0, 2^-1022). However, underflow will not occur when calculating round(x*p)/p for p some positive power of 2 less than 2¹⁰²³.)

Quantizing numbers in this way will not in general produce deterministic results. Deviations between two platforms may occur when pre-quantization values have errors may straddle midpoints between quanta.

Here is code demonstrating that rounding to multiples of a quantum does not produce deterministic results, even when there is no error in scaling.

The output I get is:

Machine 0 produces 0x1p+0 (1).
Machine 1 produces 0x1.004p+0 (1.0009765625).
The results differ.

The source code is:

#include <stdio.h>
#include <math.h>


// Round a value to the nearest multiple of the quantum.
static double Quantize(double x)
{
    static const double Quantum = 1024., InverseQuantum = 1/Quantum;

    return round(x * Quantum) * InverseQuantum;
}


int main(void)
{
    /*  For this example, we are in the middle of some calculation, where we
        have some value a from earlier operations.  a0 and a1 represent the
        calculated values of a on two different platforms.  Observe that the
        difference is as small as possible, just a single ULP.
    */
    double a0 = 0x1.cbd9f42000000p0;
    double a1 = 0x1.cbd9f42000001p0;

    // Define a constant that the calculation uses.
    double b = 0x1.1d2b9fp-1;

    // Calculate the pre-quantization result on each machine.
    double x0 = a0 * b;
    double x1 = a1 * b;

    // Quantize the result on each machine.
    double y0 = Quantize(x0);
    double y1 = Quantize(x1);

    // Display the results.
    printf("Machine 0 produces %a (%.53g).\n", y0, y0);
    printf("Machine 1 produces %a (%.53g).\n", y1, y1);
    printf("The results %s.\n", y0 == y1 ? "are identical" : "differ");

    return 0;
}

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow