I guess the problem is that your counter is read and incremented by multiple threads. As a result, multiple threads will use the same counter value as index in the array. You should instead use int atomicAdd(int* address, int val);
to increment the counter. The code would look like this:
int oldCounter = atomicAdd(counter, 2);
devWords[oldCounter] = word_index;
devWords[oldCounter+1] = start;
Note that I incremented counter
before accessing the array. atomicAdd(...)
returns the old value of the counter, which I then used to access the array.
The Atomic operations however are serialized, which means that incrementing the counter can not run in parallel. The rest oft the code is still running in parallel though.