I was able to resolve the issue. There were three points missing.
Firstly I was missing compiler optimization. Turning on -O2 significantly improved performance. So the CFLAGS are
CFLAGS="-O2 -mfpu=vfpv3-d16 -mhard-float -D_NDK_MATH_NO_SOFTFP=1"
The second point is library and platform speficic. This post (Application hang after call nested function with Android NDK) delivers the answer. In short add
--with-slow-timer
As @andrewsieh said it is necessary to edit configure so that libm is not always linked before libm_hard.a