What could cause a deterministic process to generate floating point errors

https://stackoverflow.com/questions/968435

13-09-2019
|

Question

Having already read this question I'm reasonably certain that a given process using floating point arithmatic with the same input (on the same hardware, compiled with the same compiler) should be deterministic. I'm looking at a case where this isn't true and trying to determine what could have caused this.

I've compiled an executable and I'm feeding it the exact same data, running on a single machine (non-multithreaded) but I'm getting errors of about 3.814697265625e-06 which after careful googling I found is actually equal to 1/4^9 = 1/2^18 = 1/262144. which is pretty close to the precision level of a 32-bit floating point number (approx 7 digits according to wikipedia)

My suspicion is that it has something to do with optimisations that have been applied to the code. I'm using the intel C++ compiler and have turned floating point speculation to fast instead of safe or strict. Could this make a floating point process non-deterministic? Are there other optimisations etc that could lead to this behaviour?

EDIT: As per Pax's suggestion I recompiled the code with floating point speculation turned to safe and I'm now getting stable results. This allows me to clarify this question - what does floating-point-speculation actually do and how can this cause the same binary (i.e. one compilation, multiple runs) to generate different results when applied to the exact same input?

@Ben I'm compiling using Intel(R) C++ 11.0.061 [IA-32] and I'm running on an Intel quadcore processor.

Solution

In almost any situation where there's a fast mode and a safe mode, you'll find a trade-off of some sort. Otherwise everything would run in fast-safe mode :-).

And, if you're getting different results with the same input, your process is not deterministic, no matter how much you believe it to be (in spite of the empirical evidence).

I would say your explanation is the most likely. Put it in safe mode and see if the non-determinism goes away. That will tell you for sure.

As to whether there are other optimizations, if you're compiling on the same hardware with the same compiler/linker and the same options to those tools, it should generate identical code. I can't see any other possibility other than the fast mode (or bit rot in the memory due to cosmic rays, but that's pretty unlikely).

Following your update:

Intel has a document here which explains some of the things they're not allowed to do in safe mode, including but not limited to:

reassociation: (a+b)+c -> a+(b+c).
zero folding: x + 0 -> x, x * 0 -> 0.
reciprocal multiply: a/b -> a*(1/b).

While you state that these operations are compile-time defined, the Intel chips are pretty darned clever. They can re-order instructions to keep pipelines full in multi-CPU set-ups so, unless the code specifically prohibits such behavior, things may change at run-time (not compile-time) to keep things going at full speed.

This is covered (briefly) on page 15 of that linked document that talks about vectorization ("Issue: different results re-running the same binary on the same data on the same processor").

My advice would be to decide whether you need raw grunt or total reproducability of results and then choose the mode based on that.

OTHER TIPS

If your program is parallelized, as it might be to run on a quad core, then it may well be non-deterministic.

Imagine that you have 4 processors adding a floating point value to the same memory location. Then you might get

(((InitialValue+P1fp)+P2fp)+P3fp)+P4fp

(((InitialValue+P2fp)+P3fp)+P1fp)+P4fp

or any of the other possible orderings.

Heck, you might even get

 InitialValue+(P2fp+P3fp)+(P1fp+P4fp)

if the compiler is good enough.

Unfortunately, floating point addition is not commutative or associative. Real number arithmetic is, but floating point is not, because of rounding, overflow, and underflow.

Because of this, parallel FP computation is often non-deterministic. "Often", because programs that look like

  on each processor
    while( there is work to do ) {
       get work
       calculate result
       add to total 
    }

will be non-deterministic, because the amount of time that each takes may vary widely - you can't predict the order of operations. (Worse if the threads interact.)

But not always, because there are styles of parallel programming that are deterministic.

Of course, what many folks who care about determinism do is work in integer or fixed point to avoid the problem. I am particularly fond of superaccumulators, 512, 1024, or 2048 bit numbers that floating point numbers can be added to, without suffering rounding errors.

As for a single threaded application: the compiler may rearrange the code. Different compilations may give different answers. But any particular binary should be deterministic.

Unless... you are working in a dynamic language. That performs optimizatuions that reorder the FP computations, that vary over time.

Or unless... really long shot: Itanium had some features, like the ALAT, that made even single threaded coded non-deterministic. You are unlikely to be affected by these.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow