Question

I have a scenario where i'm trying to execute a complex application on AIX and Linux.

During the execution the code makes use of the intrinsic function sqrt() for computation, but the result obtained is different on both the machines.

Does anyone know the reason for this behavior? Is there anyway to overcome this?

P.S

Some values are equal on both machines but majority of them are different.

Was it helpful?

Solution

Processors that follow the IEEE 754 specification must return the exact result for square root (or correctly rounded when exact cannot be represented). For the same input values, floating point format, and rounding mode, different IEEE 754 compliant processors must return an identical result. No variation is allowed. Possible reasons for seeing different results:

  1. One of the processors does not follow the IEEE 754 floating point specification.
  2. The values are really the same, but a print related bug or difference makes them appear different.
  3. The rounding mode or precision control is not set the same on both systems.
  4. One system attempts to follow the IEEE 754 specification but has an imperfection in its square root function.

Did you compare binary output to eliminate the possibility of a print formatting bug or difference?

Most processors today support IEEE 754 floating point. An example where IEEE 754 accuracy is not guaranteed is with the OpenCL native_sqrt function. OpenCL defines native_sqrt (in addition to IEEE 754 compliant sqrt) so that speed can be traded for accuracy if desired.

Bugs in IEEE 754 sqrt implementations are not too common today. A difficult case for an IEEE 754 sqrt function is when the rounding mode is set to nearest and the actual result is very near the midway point between two floating point representations. A method for generating these difficult square root arguments can be found in a paper by William Kahan, How to Test Whether SQRT is Rounded Correctly.

OTHER TIPS

There may be slight differences in the numeric representation of the hardware on the two computers or in the algorithm used for the sqrt function of the two compilers. Finite precision arithmetic is not as the same the arithmetic of real numbers and slight differences in calculations should be expected. To judge whether the differences are unusual, you should state the numeric type that you are using (as asked by ChuckCottrill) and give examples. What is the relative difference. For values of order unity, 1E-9 is an expected difference for single precision floating point.

Check the floating point formats available for each cpu. Are you using single-precision or double-precision floating point? You need to use a floating point format with similar precision on both machines if you want comparable/similar answers.

Floating point is an approximation. A single precision floating point only uses 24 bits (including sign bit) for mantissa, and the other 8 bits for exponent. This allows for about 8 digits of precision. Double precision floating point uses 53 bits, allowing for much more precision.

Lacking detail about the binary values of the differing floating point numbers on the two systesm, and the printed representations of these values, you have rounding or representation differences.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top