Pregunta

When I start my program and a communication with my card (Tesla K20c), I lost a lot's of time when I do the first call to the card inside my code and i don't know why. it seems every time he tries to search all possible cards. if I use cudasetdevice() to define my card I have the same problem.

user time (s): 1.420

system time (s): 4.660

elapsed time (s): 6.490

The system time represent this lost time. When I run my program on an other computer with a GeForce GTX 560 Ti (a less powerful and older card) you can see the system time is normal.

user time (s): 1.620

system time (s): 0.700

elapsed time (s): 3.120

this problem is double the time of the program and I would understand why. this is the first time I have this kind of problem with a card.

Is it because the card is too recent and the cuda library is not yet optimized for?

I use version 5.0 cuda.

¿Fue útil?

Solución

This is due to the CUDA driver being loaded and a CUDA context (an environment where all your data and programs are held in the device) being created every time, which requires lots of bookkeeping. You can force the driver to be loaded at all times by doing the following as root:

nvidia-smi -pm 1

Which will enable the so-called "persistence mode" (set it to 0 to disable). This will speed up your initialisation.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top