Can very large floating-point numbers cause non-determinism?

https://stackoverflow.com//questions/9579627

08-12-2019
|

Question

I'm running a C++ optimization program in a win32 environment. The program uses pre-built DLL's for FFTW and pthreads.

Recently, the program was changed in a way that it can encounter very large numbers and possibly infinity. After this change, this otherwise lean and robust system started to produce strange symptoms - most notably it produced different numerical results on different runs (on the same computer, with the same binary), and even adding a printf or a dummy allocation here and there changed behaviour radically.

I double checked every possible buffer overrun, memory allocations, threading issues (I now reduced the thread pool size to 1), stack size, but after weeks of searching, I found nothing. Before the change, the program had no non-determinism or stability issues, it regularly run for days.

I wonder if the problem could lie in the FFTW module? Or can such floating-point instability stem from large numbers?

La solution

Floating point numbers themselves won't cause non-determinism but any third party libraries you may be using may do so if they're buggy, such as not handling infinities correctly.

You may also want to consider the possibility that your own code may be the culprit. That's usually (though not always) the case when the third-party library is heavily used since it's not out of the range of imagination to assume that most bugs would already have been found by someone else.

Whether FFTW lies within that category, I don't know. But it's certainly possible it may have been tested by more people than your own code :-)

Autres conseils

Use Valgrind to find out if your ar reading from uninitialized variables. They are the most common source of unwanted randomness and thus non determinism.

Another point may be multithreading (although you say you reduced the threadpool to one), possibly a race condition between a control thread and a work thread. Valgrind can assist checking for potential races in multithreaded code too.

Large numbers do not cause non-deterministic behavior, but they can magnify it -- what were previously small rounding differences can become the difference between a finite number and a NaN or infinity.

One thing to look at is the alignment of the buffers being passed to FFTW. Like most high-performance numerical software, it may use different implementations depending on data alignment.

I was looking for some hints in order to solve a similar issue with floating values and non deterministic behaviour in Java, and I have ended up in this thread. I just want to share this LINK that explains why a C++ code can cause a non deterministic behaviour when it uses floating values which are close to overflow. The article states that the problems are caused by the underlaying compiler translation to machine code. Depending on whether the machine compares already truncated values or values stored on CPU registers with more precission we can get different behaviours. I hope this helps.

Licencié sous: CC-BY-SA avec attribution

Non affilié à StackOverflow