Question

Say I had a software algorithm (for example, an FFT), and I need it to process (n) amounts of data in (t) milliseconds. This is a real-time task written in C. There are a lot of CPUs out there, and you could just select the fastest one, but we also want something just right for the job while reducing cost.

FFTs are O(n log n) as far as I know, so maybe one can say that it would take k * (n log n) to perform an FFT on n units of data. Even if the constant was known, how would I translate that to actual CPU cycles, in order to determine which CPU is suitable?

A colleague from work posed this question to me and I couldn't answer it, as this is in computer engineering territory which I'm not familiar with.

Assume that this software program runs on its own, with no OS or other overhead involved.

Was it helpful?

Solution

"Big O notation" is meaningless for this purpose. If you know n you can estimate the number of operations, but some operations are cheap (e.g. 1 cycle) and some operations are not (e.g. cache miss costing thousands of cycles); and the best estimate you can reach using this method is going to be dwarfed by "estimation error". Note: "Big O notation" is mostly only useful for theoretical comparisons of algorithms that do the same thing on the same real or hypothetical machine.

In practice; you want to benchmark real code on a real CPU with realistic data. Based on this you'd apply "weighted modifiers" to adapt the estimate to other CPUs; that take into account things like differences in memory hierarchy (latency, cache sizes), differences in CPUs (is one faster/slower than the other for SIMD, floating point, integer operations), differences is compilers (does one optimise much better/worse than another), etc (where the "weights" come from how much you think each thing effects performance).

It's easy to say "I benchmarked it on a 3 GHz CPU and it took 10 seconds for a realistic load; so I guess it'll take about 45 seconds on a 1 GHz CPU with slower RAM and slightly worse floating point."

OTHER TIPS

You need some reliable benchmarks for the CPU in question, and for comparable CPUs, of which at least one is available to you. Then you develop the algorithm, measure it on the CPUs that you have, and taking the benchmark data, guess how long the other CPUs will take.

Then you decide which CPUs might be able to handle your algorithm, and select the ones that are cheapest while being able to do the task. And then you implement your algorithm and check that it is indeed fast enough.

There's always the possibility that some CPUs are more optimised for good benchmark results than others that are more optimised for real life performance, so the conclusions from measurements and benchmarks might be wrong once you measure on a particular CPU.

The complexity of an algorithm will be the same, whether it's implemented in hardware or software. You might get a much lower 'k' if it's entirely implemented in CPU, but it's still O(n log n).

In general terms, assuming the work is CPU bound and you are staying within a CPU series where the differences between model numbers is principly clock speed, then the difference between differently clocked but otherwise identical CPU will be roughly proportional to the difference in clock speed.

Changing between different revisions of the same CPU family can change the behaviour significantly. Whilst the underlying assembler instructions processed may be largely identical, the hardware optimisation may be significantly different in a way that effects your particular task. There may be less opportunities for parallel execution, the command pipeline and branch prediction may be less favourable, or some of the assembler instructions may be implemented as microcode (a mini software program inside the CPU) and take many more cpu cycles to complete.

Changing between CPU architectures such as switching from x86 to ARM or XScale makes all of the above more likely.

There is also another aspect to this. Cost is also more than just purchase cost of the CPU. There are a wider chipset that is needed on the processor board, and then, depending on the operating environment, the electricity cost of running the processor (and it battery powered, the cost differential for bigger / smaller batteries), and also the replacement cost for failed CPU's. Typically hotter running CPU's do not last as long.

Licensed under: CC-BY-SA with attribution
scroll top