Excessive Kernel Launches on Context Creation

https://stackoverflow.com/questions/16619441

29-05-2022
|

Question

Recently I began extending a very boost dependent project to use CUDA for its innermost loop. I thought it would be worth posting here about some odd behaviour I've been seeing though. Simply including certain boost headers will cause my first cuda call to generate a large number of kernels.

If compile and debug the following code: simplestCase.cu

#include <boost/thread.hpp>

int main(int argc, char **argv){
int *myInt;
cudaMalloc(&myInt, sizeof(int));
return 0;
}

I get the following debug message lines upon executing cudaMalloc (same behaviour if I run a kernel I've defined. Seems like anything that triggers context creation will trigger this.):

[Launch of CUDA Kernel 0 (memset32_post<<<(1,1,1),(64,1,1)>>>) on Device 0]
[Launch of CUDA Kernel 1 (memset32_post<<<(1,1,1),(64,1,1)>>>) on Device 0]
[Launch of CUDA Kernel 2 (memset32_post<<<(1,1,1),(64,1,1)>>>) on Device 0]
[Launch of CUDA Kernel 3 (memset32_post<<<(1,1,1),(64,1,1)>>>) on Device 0]
[Launch of CUDA Kernel 4 (memset32_post<<<(1,1,1),(64,1,1)>>>) on Device 0]
[Launch of CUDA Kernel 5 (memset32_post<<<(1,1,1),(64,1,1)>>>) on Device 0]
[Launch of CUDA Kernel 6 (memset32_post<<<(1,1,1),(64,1,1)>>>) on Device 0]
[Launch of CUDA Kernel 7 (memset32_post<<<(1,1,1),(64,1,1)>>>) on Device 0]
[Launch of CUDA Kernel 8 (memset32_post<<<(1,1,1),(64,1,1)>>>) on Device 0]

So far I have identified two headers that cause the problem: boost/thread.hpp boost/mpi.hpp

Here's a bit of info that may be useful in replicating the problem:

IDE: nSight Eclipse edition
OS: ubuntu 12.04 x64
GPU: GeForce GTX 580 (I believe my GeForce GT 520 is being used by my OS)
boost lib: 1.52
cat /proc/driver/nvidia/version:
- NVRM version: NVIDIA UNIX x86_64 Kernel Module 310.32 Mon Jan 14 14:41:13 PST 2013
- GCC version: gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5)

project settings:

Properties->Build->CUDA->DeviceLinkerMode = Separate Compilation
Properties->Build->CUDA->GenerateGPUCode = 2.0
Properties->Build->Settings->ToolSettings->NVCCLinker->Libraries = boost_system
Properties->Name = simplest_case_example

I think that's everything.

Edit:

Thank you for bringing my attention to the fact that I hadn't asked a question. I knew I was forgetting something critical. My question is this:

It seems odd to me that very specific includes on their generate peripheral kernel calls, particularly since I don't use those includes, and I don't see how they could affect my interaction with CUDA. Should cuda be launching this many extra kernels for code I'm not even using? I see over 100 kernels launched in the project I'm working on now when the only CUDA related code I have in my project is a single cudaMalloc at the program's entry point.

Edit2:

Also happens on a Tesla K20 (kepler architecture card, whereas I think the GTX 580 is fermi).

Edit3:

Updated cuda driver to version 319.23. No change in the behaviour I mentioned above, but this did fix the debugger issues I was having in larger programs.

Solution

Well, still no actual issues arising from this, so I suppose it's simply stuff that happens in the background.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow