Question

I'm using the threaded version of FFTW (a FFT library) to try to speed up some code on a dual CPU machine. Here is the output of time w/ only 1 thread:

131.838u 1.979s 2:13.91 99.9%

Here it is with 2 threads:

166.261u 30.392s 1:52.67 174.5%

The user times and the CPU load percentages seem to indicate that it is threading pretty effectively, but the wallclock time (which is what I really care about) tells me (I think) that it is taking around 28 extra seconds to deal with the threads. Is that an accurate way to describe the situation? If so, is it fairly normal, or do I probably have something configured incorrectly? Thanks for any light.

Was it helpful?

Solution

I've used the FFTW a fair amount, and have found that, unless you're going to more than two processors, it's almost always a cleaner solution to just use the single threaded version. It's faster because there's less inter-thread communication, or at least, that's been my experience.

A few things to check out:

  1. Are you configuring your wisdom properly, and using it? Wisdom, once created, will make your transform run much more quickly. If you aren't using it, you should be.
  2. Are you calling the library from one thread, or from two? That was always my problem, was locking multiple thread calls into the library getting to be painful.
  3. How big are your transforms? Are you trying with a small one at first, just to see how it goes, then scaling up?
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top