Is there really a timeout for kernels on nvidia gpus?
-
24-12-2020 - |
문제
searching for answers for why my kernels produce strange error messages or "0" only results I found this answer on SO that mentions that there is a timeout of 5s for kernels running on nvidia gpus? I googled for the timout but I could not find confirming sources or more information.
What do you know about it?
Could the timout cause strange behaviour for kernels with a long runtime?
Thanks!
해결책
Further googling brought up this in the CUDA_Toolkit_Release_Notes_Linux.txt (Known Issus):
# Individual GPU program launches are limited to a run time of less than 5 seconds on a GPU with a display attached. Exceeding this time limit usually causes a launch failure reported through the CUDA driver or the CUDA runtime. GPUs without a display attached are not subject to the 5 second runtime restriction. For this reason it is recommended that CUDA be run on a GPU that is NOT attached to a display and does not have the Windows desktop extended onto it. In this case, the system must contain at least one NVIDIA GPU that serves as the primary graphics adapter.
[update] It seems that the official name for this feature is 'watchdog'.
다른 팁
If you're on Windows Vista or later, the WDDM driver stack will automatically reset the device after about two seconds unless you tweak your TDR timeouts. (Windows can't tell the difference between a GPU running a lengthy kernel and a GPU that's locked up.) Tesla-branded cards running in TCC mode aren't subject to the normal display adapter restrictions and can therefore run longer kernels.