Is there really a timeout for kernels on nvidia gpus?

https://stackoverflow.com/questions/5117961

24-12-2020
|

سؤال

searching for answers for why my kernels produce strange error messages or "0" only results I found this answer on SO that mentions that there is a timeout of 5s for kernels running on nvidia gpus? I googled for the timout but I could not find confirming sources or more information.

What do you know about it?

Could the timout cause strange behaviour for kernels with a long runtime?

Thanks!

المحلول

Further googling brought up this in the CUDA_Toolkit_Release_Notes_Linux.txt (Known Issus):

# Individual GPU program launches are limited to a run time of less than 5 seconds on a GPU with a display attached. Exceeding this time limit usually causes a launch failure reported through the CUDA driver or the CUDA runtime. GPUs without a display attached are not subject to the 5 second runtime restriction. For this reason it is recommended that CUDA be run on a GPU that is NOT attached to a display and does not have the Windows desktop extended onto it. In this case, the system must contain at least one NVIDIA GPU that serves as the primary graphics adapter.

[update] It seems that the official name for this feature is 'watchdog'.

نصائح أخرى

If you're on Windows Vista or later, the WDDM driver stack will automatically reset the device after about two seconds unless you tweak your TDR timeouts. (Windows can't tell the difference between a GPU running a lengthy kernel and a GPU that's locked up.) Tesla-branded cards running in TCC mode aren't subject to the normal display adapter restrictions and can therefore run longer kernels.

مرخصة بموجب: CC-BY-SA مع الإسناد

لا تنتمي إلى StackOverflow