Fast trigonometric functions using only integer in c++ for arm target

Question 1

Exynos 4412 uses the Cortex-A9 core[1], which has fully pipelined single- and double-precision floating-point. There is no reason to resort to integer operations, as there was with some older ARM cores.

Depending on your specific accuracy requirements (and especially if you can guarantee that the inputs fall into a limited range), you may be able to use approximations that are significantly faster than the implementations available in the standard library. More information about your exact usage would be necessary to give sound advice.

[1] http://en.wikipedia.org/wiki/Exynos_(system_on_chip)

Question 2

If your target platform has a math library, use it. If it is any good, it was written by experts who were considering speed. You should not base code design on guesses about what is fast or slow. If you do not have actual measurements or processor specifications, and you do not know trigonometric functions in your application are consuming a lot of time, then you do not have good reason for replacing the math libraries.

Floating-point instructions typically have longer latencies than integer instructions, but they are pipelined so that throughput may be comparable. (E.g., a floating-point unit might have four stages to do the work, so an instruction takes four cycles to work through all the stages, but you can push a new instruction into the first stage in each cycle.) Whether the pipelining is sufficient to provide performance on a par with an integer implementation depends greatly on the target processor, the algorithm being used, and the skill of the implementor.

If it is beneficial in your case to use custom implementations of the math routines, then how they should be designed is hugely dependent on circumstances. Proper advice depends on the domain to support (Just 0 to 2π? –2π to +2π? Possibly larger values, which have to be folded to -π to π?), what special cases needed to be supported (Propagate NaNs?), the accuracy required, what else is happening in the processor (Is a lot of memory in use or can we rely on a lookup table remaining in cache?), and more.

A significant part of the trigonometric routines is handling various cases (NaNs, infinities, small values) and reducing arguments modulo 2π. It may be possible to implement stripped-down routines that do not handle special cases or perform argument reduction but still use floating-point.

Question 3

One possible alternative is trigint:

Question 4

You should use "fixed point" math rather than floating point.

Most ARM processors (7 and above) allow for 32 bits of resolution in the fixed point. So you could go to 1E-3 radians quite easily. But the real question is how much accuracy do you need in the results?

Whether to use lookup tables, lookup tables with interpolation or functions depends on how much data space you have on your system. Lookup tables are fastest execution, but use the most data space. Functions use the least amount of data but require the most execution time. Interpolation may be a mitigation that allows smaller tables and some extra processing.