cublas failed to synchronize stop event?

Question

In general, when using CUDA on windows, it's necessary to make sure the execution time of a single kernel is not longer than about 2 seconds. If the execution time becomes longer, you may hit a windows TDR event. This is a windows watchdog timer that will reset the GPU driver if it does not respond within a certain period of time. Such a reset halts the execution of your kernel and generates bogus results, as well as usually a briefly "black" display and a brief message in the system tray. If your kernel execution is triggering the windows watchdog timer, you have a few options:

If you have the possibility to use more than one GPU in your system (i.e. usually not talking about a laptop here) and one of your GPUs is a Quadro or Tesla device, the Quadro or Tesla device can usually be placed in TCC mode. This will mean that GPU can no longer driver a physical display (if it was driving one) and that it is removed from the WDDM subsystem, so is no longer subject to the watchdog timer. You can use the nvidia-smi.exe tool that ships with the NVIDIA GPU driver to modify the setting from WDDM to TCC for a given GPU. Use your windows file search function to find nvidia-smi.exe and then use nvidia-smi --help to get command line help for how to switch from WDDM to TCC mode.
If the above method is not available to you (don't have 2 GPUs, don't have a Quadro or Tesla GPU...) then you may want to investigate changing the watchdog timer setting. Unfortunately this requires modifying the system registry, and the process and specific keys vary by OS. There are a number of resources on the web, such as here from Microsoft, as well as other questions on Stack Overflow, such as here, which may help with this.
A third option is simply to limit the execution time of your kernel(s). Successive operations might be broken into multiple kernel calls. The "gap" between kernel calls will allow the display driver to respond to the OS, and prevent the watchdog timeout.

The statement about TCC support is a general one. Not all Quadro GPUs are supported. The final determinant of support for TCC (or not) on a particular GPU is the nvidia-smi tool. Nothing here should be construed as a guarantee of support for TCC on your particular GPU.