Pregunta

I am working on a comparative study in which I have to make a comparison of the serial and parallel versions of an algorithm (NSGA-II algorithm to be precise download link here). NSGA-II is a heuristic optimization method and hence depends on the initial random population generated. If the initial populations generated using the CPU and the GPU are different, then I can not make an impartial speedup study.

I possess a NVIDIA-TESLA-C1060 card which has a compute capability of 1.3. As per this anwer and this NVIDIA document, we can't expect an sm_13 device to always yield an IEEE-754 compliant float (single precision) value. Which in other word means that on my current device, I cant conduct an impartial speedup study of the CUDA program corresponding to its serial counterpart.

My question is: Would switching to Fermi architecture solve the problem?

¿Fue útil?

Solución

Floating-point operations will yield different results on different architectures, regardless of whether they support IEEE754 or not, since floating-point is not associative. Even switching compiler on x86 will typically give different results. This whitepaper gives some excellent explanations.

Having said that, your real issue is that you have a data dependent algorithm where the operations are dependent on the random numbers you generate. So if you generate the same numbers on the CPU and the GPU then both runs will be following the same paths. Consider using cuRAND, which can generate the same numbers on both the CPU and GPU.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top