Question

Is it possible to remotely execute a CUDA profile execution (similar to computeprof) and then bring the profile back for analysis?

The particular remote machine is headless and not-under-my-control, so no X, no Qt libraries, etc.

Was it helpful?

Solution

Yes you can. The CUDA driver has built-in profiling facilities. How to do it is discussed in the Compute_Profiler.txt file you will find in the doc directory of the toolkit, but the basic idea is something like this:

$ COMPUTE_PROFILE=1 COMPUTE_PROFILE_CSV=1 COMPUTE_PROFILE_LOG=log.csv COMPUTE_PROFILE_CONFIG=config.txt ./app

which tells the runtime to turn on profiling, use csv format output written to log.csv, including the profile statistics read from config.txt. After the app has run, the runtime will drop an output file with the raw profiling results in them. You can then use the tool of your choice to look at them. The visual profiler can be convinced open to the output, but a lot of the fancy synchronization it does requires the output to be generated using its own profile configuration files (under the hood it is dynamically doing the same thing you do manually, but on the fly). I have done some digging around and scraped copies of the configuration files so I could regenerate specific application profiling runs without the profiler on headless cluster nodes. Not too much fun, but it can be done.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top