Single Precision math slower than Double Precision in FFTW? [closed]

https://stackoverflow.com/questions/20151680

03-08-2022
|

Question

I am looking at the benchmarks of FFT library and wondering why double precision math would be faster than that of the single precision (even on a 32-bit hardware).

Solution

Assuming Intel CPUs - It all depends on the compiler. Compiling for 32 bit applications , you can use normal i87 floating point where single and double precision are the same speed. Or you can select SSE for SP and SSE2 for DP, where SSE (4 words in registers) is twice as fast as SSE2 (2 words per register). Compiling for 64 bits, i87 instructions are not available. Then floating point is always compiled to use SSE/SSE2. Either due to the compiler or the particular program, these can be compiled as SIMD (Single Instruction Multiple Data - 4/2 words at a time) or SISD (Single Data using one word per register). Then, I suppose, SP and DP will be of a similar speed and the code can be slower than 32 bit compilations.

Using data from RAM, and possibly cache, performance can be limited by bus speed, where SP will be faster than DP. If the code is like my FFT benchmarks, it depends on skipped sequential reading and writing. Then speed is affected by data being read in bursts of at least 64 bytes, where SP is likely to be a little faster.

Such as trig functions are often calculated in DP. Then SP is a bit slower due to DP to SP conversion.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow