Cannot run CUDA code that queries NVML - error regarding libnvidia-ml.so

Question 1

You are getting this error because the application that is trying to use nvml is loading the stub library that is located in:

...tdk_install_path/lib64/libnvidia-ml.so

instead of the one in:

/usr/lib64/libnvidia-ml.so

I was able to reproduce your error when I added the stub library path to my LD_LIBRARY_PATH environment variable. So that is one possible source of error, if someone added the path of the stub library that comes with the tdk distribution to your LD_LIBRARY_PATH environment variable, but probably not the only way this could happen. If someone in an unusual fashion copied the stub library to some system path, that might also be an issue.

You'll need to try and figure out why your system is loading that stub library in place of the correct one in /usr/lib64. Alternatively, for discovery purposes, you could try deleting all instances of the stub library anywhere on your system (leave the correct libraries in /usr/lib and /usr/lib64 alone), and you should be able to observe correct behavior.

Question 2

I solved the problem this way on a GTX 1070 using windows 10 : go to device manager, select the GPU that is having a problem, disable the GPU and enable back.

Question 3

I was having this same or similar issue with EWBF Cuda Miner for zCash.

Here is a way to automatically implement Pro7ech's answer (which worked for me) for WIN10:

Install WDK for Windows 10 if you don't already have it: This will give you the ability to use devcon.exe which allows manipulation of devices via batch scripts: https://docs.microsoft.com/en-us/windows-hardware/drivers/download-the-wdk

You might also need the Windows SDK if you don't have visual studio with Desktop development with C++ workload: https://developer.microsoft.com/en-us/windows/downloads/windows-10-sdk

To make things easier, you might want to add the installation path to your PATH environment variable: https://www.howtogeek.com/118594/how-to-edit-your-system-path-for-easy-command-line-access/

Devcon.exe was installed here for me:

C:\Program Files (x86)\Windows Kits\10\Tools\x64

So now run this or similar in a cmd.exe prompt to get the device id:

devcon findall * | find /i "nvidia"

Here is what mine looks like:

C:\Users\Soenhay>devcon findall * | find /i "nvidia"
HDAUDIO\FUNC_01&VEN_10DE&DEV_0083&SUBSYS_38426674&REV_1001\5&1C277AD4&0&0001: NVIDIA High Definition Audio
SWD\MMDEVAPI\{0.0.0.00000000}.{574980C3-9747-42EF-A78C-4C304E070B81}: SAMSUNG (NVIDIA High Definition Audio)
ROOT\UNNAMED_DEVICE\0000                                    : NVIDIA Virtual Audio Device (Wave Extensible) (WDM)
PCI\VEN_10DE&DEV_1B81&SUBSYS_66743842&REV_A1\4&1F1337ch33s3&0&0000: NVIDIA GeForce GTX 1070

From that I see that my graphics device id is:

PCI\VEN_10DE&DEV_1B81&SUBSYS_66743842&REV_A1\4&1F1337ch33s3&0&0000

So I create a batch file with the following to disable and re-enable the driver:

devcon disable "@PCI\VEN_10DE&DEV_1B81&SUBSYS_66743842&REV_A1\4&1F1337ch33s3&0&0000"
devcon enable "@PCI\VEN_10DE&DEV_1B81&SUBSYS_66743842&REV_A1\4&1F1337ch33s3&0&0000"

Now, when I get the NVML error when starting the miner I just run this batch file and it fixes it. You could also just add those 2 lines to the beginning of your start.bat file to do this every time but I found that the error does not always happen every time I restart the miner time now.

References:

superuser post

devcon commands

devcon examples

No matching devices found.

NOTE: The command should have the @ symbol at the beginning of the device id. The batch script should be run as administrator.

Question 4

I have faced the same error.

Found a solutions is to run command:

nvidia-uninstall