Pregunta

When we are talking about a parallel program in Cuda on GPU having a speed up over a similar sequential one on CPU , should the sequential one be compiled by a Compiler Optimizer (gcc -O2)?

I have paralleled a program on GPU. It has a speed up of 18 in comparison with its CPU implementation without a compiler optimizer. But when I add the option -O2 to nvcc compiler, the speed up rate decreases to 8.

¿Fue útil?

Solución

Of course optimizer should be used for both GPU and CPU program when comparing the performance.

If your focus on GPU v.s. CPU, the comparison should not be affected by the quality of the software code. We often assume the code should have the best performance on its hardware.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top