Android NDK: ARMv6 + VFP devices. wrong calculations, NaN, denormal numbers, VFP11 bug

Question 1

Seems like I identified bug.

It is bug in VFP11 (ARMv6 coprocessor) denorm bug. denormal numbers are very small number.

I get this numbers in physics code implementing spring with dumping

force1 = (Center - P1) * k1         // force1 directed to center 
force2 = - Velocity * k2            // force2 directed against velocity
Object->applyForce(force1)
Object->applyForce(force2)

Both forces get very small when object archieve Center and I get denormal values at the end.

I can re-write sring and dumping but I can't re-write hole BulletPhysics or all math code and predict every (even internal) occurance of denormal number.

Linker has fix code options --vfp11-denorm-fix and --vfp-denorm-fix http://sourceware.org/binutils/docs-2.19/ld/ARM.html

NDK linker has --vfp11-denorm-fix This option helps. Code looks more repliable but it does not fix problem for 100%.

I see less bugs now.

BUt if I wait sping stabilize object then I finally I get denorm -> NaN

I have to wait longer but same problems arrive.

If you know solution that will fix code like --vfp11-denorm-fix should then I give you bounty.

I tried both --vfp11-denorm-fix=scalar and --vfp11-denorm-fix=vector

Flush to Zero bit

      int x;
      // compiles in ARM mode
      asm(
              "vmrs %[result],FPSCR \r\n"
              "orr %[result],%[result],#16777216 \r\n"
              "vmsr FPSCR,%[result]"
              :[result] "=r" (x) : :
      );

Not sure why but it requires LOCAL_ARM_MODE := arm in Android.mk May be -mfpu=vfp-d16 instead of of just vfp is required.

Manually clear denormal numbers

I have spring code described above. I improved it by clearing denormal number manually without using FPU with following function.

inline void fixDenorm(float & f){
    union FloatInt32 {
        unsigned int u32;
        float f32;
    };
        FloatInt32 fi;
        fi.f32 = f;

        unsigned int exponent = (fi.u32 >> 23) & ((1 << 8) - 1);
        if(exponent == 0)
            f = 0.f;
}

Original code was failing in 15-90 seconds from start in many places.

Current code showed issue possibly related to this bug in only one in place after 10 minutes of physics simulation.

Reference to bug and fix http://sourceware.org/ml/binutils/2006-12/msg00196.html

They say that GCC uses only scalr code and --vfp11-denorm-fix=scalar is enough. It adds 1 extra command to slow down. But even --vfp11-denorm-fix=vector that adds 2 extra commands is not enough.

Problem is not easier re-producible. On phones with higher frequency 800Mhz I see it more often then on slower one 600Mhz. It is possible that fix was done when there was no fast CPUs on market.

We have many files in project and every configuration compilations takes around 10 minutes. Testing with current state of fix requires ~10 minutes to play on phone. + We heat phone under the lamp. Hot phone shows errors faster.

I wish to test different configurations and report what fix is most efficient. But right now we have to add hack to kill last bug possibly related to denorms.

I expected to find silver bullet that will fix it but only -msoft-float with 10x performance degradation or running app on ARMv7 does it.

After I replaced previous fixDenorm function with new fixDenormE in spring/dumping code and applying the new function for ViewMatrix I get rid of last bug.

inline void fixDenormE(float & f, float epsilon = 1e-8){
    union Data32 {
        unsigned int u32;
        float f32;
    };
        Data32 d;
        d.f32 = f;

        unsigned int exponent = (d.u32 >> 23) & ((1 << 8) - 1);
        if(exponent == 0)
            f = 0.f;
        if(fabsf(f) < epsilon){
          f = 0.f;
        }
}

Question 2

This page has an interesting discussion on ARM FPU options: VfpComparison

I think if you want to build for ARM v6, you might do this: -march=armv6 -mcpu=generic-armv6 -mfloat-abi=softfp (and leave out the -mfpu option). If you are not targetting specifically the processor you mentioned above, generic armv6 doesn't have a guaranteed fpu.

Another option is to try -mfloat-abi=hard, on the theory that there is a compiler bug somewhere around softfp.

Also check for any stack corruption etc in your code, it is possible that when floating point values are passed you clobber them.

P.S. You might also want to try out a floating-point tester such as TestFloat or the venerable netlib paranoia. While you have an example of floating point failing on this particular processor and with these compiler options, you don't know how widespread a problem it is. It could be worse than you think :)