Pergunta

I'm having some trouble with a very basic CUDA program. I have a program that multiplies two vectors on the Host and on the Device and then compares them. This works without a problem. What's wrong is that I'm trying to test different number of threads and blocks for learning purposes. I have the following kernel:

__global__ void multiplyVectorsCUDA(float *a,float *b, float *c, int N){
    int idx = threadIdx.x;
    if (idx<N) 
        c[idx] = a[idx]*b[idx];
}

which I call like:

multiplyVectorsCUDA <<<nBlocks, nThreads>>> (vector_a_d,vector_b_d,vector_c_d,N);

For the moment I've fixed nBLocks to 1 so I only vary the vector size N and the number of threads nThreads. From what I understand, there will be a thread for each multiplication so N and nThreads should be equal.

The problem is the following

  1. I first call the kernel with N=16 and nThreads<16 which doesn't work. (This is ok)
  2. Then I call it with N=16 and nThreads=16 which works fine. (Again works as expected)
  3. But when I call it with N=16 and nThreads<16 it still works!

I don't understand why the last step doesn't fail like the first one. It only fails again if I restart my PC.

Has anyone run into something like this before or can explain this behavior?

Foi útil?

Solução 2

Don't know if its ok to answer my own question but I realized I had a bug in my code when comparing the host and device vectors (that part of the code wasn't posted). Sorry for the inconvenience. Could someone please close this post since it won't let me delete it?

Outras dicas

Wait, so are you calling all three in a row? I don't know the rest of your code, but are you sure you're clearing out the graphics memory you alloced between each run? If not, that could explain why it doesn't work the first time but does the third time when you're passing the same values, and why it only works again after rebooting (rebooting clears all the memory alloced).

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top