Serial CPU vs GPU code

https://stackoverflow.com/questions/8712657

13-04-2021
|

Question

I'm writing a theoretical assignment of the possibilities in heterogeneous computing. I need to compare the effectiveness of a single thread (non-parallelizable) executed in serial manner on either the CPU or the GPU.

I know it's an odd question since it doesn't make sense to execute a single thread on the GPU, but I could really use a guide-line ratio for a heuristic I am developing.

I know that it could easily be tested, but I don't have any practical experience with neither CUDA nor OpenCL, and I'm in a hurry.

Solution

GPU execution units tend to be in-order, and (in the case of nVidia GPUs at least) you only typically get only one instruction per 4 clocks in a single-threaded context. Compare this with modern superscalar CPUs where you can typically get a throughput of > 1 instruction per clock and the CPU wins by factor of 4 or more on a clock-for-clock basis. CPU clock frequencies tend to be much higher than GPU clocks though, so there could easily be a further factor of 3 from clock speed, taking the CPU up to 12x or more relative to the GPU.

OTHER TIPS

A single GPU core is magnitudes weaker than that of CPUs, plus there's an overhead of data transfer. The reason why GPU wins on a parallel code is the sheer number of threads running on 100s of core simultaneously. Consider this example:

for(int i=0; i<1000; i++)
  a[i]= /*some value*/

If you use only 1 thread, it has to iterate 1000 times. CPU will definitely be very, very fast than GPU. But to benefit from GPU we create 1000 threads, each inserting the value to memory 'a' at its corresponding position. This might result in performance gain over CPU.

NVidia Tesla GPUs perform some operations at 4ops/clock.
Memory bandwidth on a GPU is generally faster than CPU<->main RAM and there are some clever caching features for 2d structure.

But generally no - if you don't need to do the same thing to >256 items use a CPU

You should also consider the clock rate at which GPU operates(1-2 GHz), which is very low compared to that of CPUs. Performance of GPUs are really not a choice when the code is not parallel.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow