Question

I am writing an iPhone app that needs to calculate the square root of a number about 2000 times every 1/30th of a second. sqrt() works fine on a computer, but the frame rate drops to around 10 FPS on an iPhone or iPad, and I have already optimized the rest of the code. I have heard that this can be sped up dramatically by estimating the square root, but I can not find any code to do this. I only need one or two decimal places of precision. Any suggestions on how to do this, or other ways to speed things up would be appreciated.

Thanks!

Was it helpful?

Solution

How accurate do you want your estimate to be? If you know how close you want your estimate to be to the real sqrt the Newton's Method is your friend.

Do you know the range of values that are passed to sqrt? If so you can make up a look up table that is precomputed at startup (or even read from disk at startup depending on what turns out to be faster). Find the closest in the table to your input and you get your estimate.

OTHER TIPS

Unless you actually need the square root, compare the squared values rather than the raw values and the square root.

Squaring is much faster (and more accurate) than taking a square root, if you only need comparisons. This is the way most games do it.

Do you know the range of values that you are trying to find the square root of? Say you have values ranging from 0 to 10. You can then precalculate an array:

sqrt_val[0] = 0;
sqrt_val[1] = 1;
sqrt_val[2] = // the sqrt of 2
...
sqrt_val[10] = // the sqrt of 10

Then during runtime you take the number that you want the sqrt of, convert that to an integer (so for example 3.123 becomes 3) and use that as an index (3) to look up the precalculated value.

Of course if you want finer resolution you can just increase the number of items in your array.

First off, are you certain that square root is actually the bottleneck? Have you profiled? 2000 square roots every 1/30th of a second actually isn't all that many, even on a cell phone. The ARM documentation quotes 33 cycles for a single-precision square root and 60 cycles for double-precision; a 600mHz processor can do 10 million square roots per second (more if the instruction is pipelined at all).

If you have profiled, and square root really is the bottleneck, you will want to use the NEON vrsqrte.f32 instruction. This instruction is quite fast and gives you the approximate reciprocal square roots of four floating-point numbers simultaneously. You can then use the vmul.f32 instruction to get approximate square roots (though for many uses the reciprocal is more useful than the square root itself).

Maybe this is for you:
Fast inverse square root
If this method doesn't provide the accuracy you need there are also alot of other iterative methods where you can choose more or less precise between speed and accuracy:
Methods of computing square roots

The easiest change you can make on an iPhone is to use sqrtf() instead of sqrt(). Single precision float math is much faster than double precision, especially on devices of 3GS vintage and newer.

If you need the square root to calculate a Pythagoras triangle (sqrt(x*x + y*y)), and both x and y are nonnegative, then a very fast approximation to that is

max(x,y) + min(x,y)*0.333

This has a maximum error of 5.7%. Watch out for branch misprediction in min() and max() though.

If you have a "normal" positive float or double, not an int, and want to use a table look-up method, you can do two separate table look ups, one for the exponent (re-biased), and one for a few bits of the mantissa (shift and mask bitfield extraction), and then multiply the two table look up results together.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top