Pergunta

I have a global function that get an array and index to array. the function need to find a word in some dictionary and where it start in a given sequence.

but I see that the threads are overwrite the results. so I guess its because a memory race. what can I do?

__global__ void find_words(int* dictionary, int dictionary_size, int* indeces, 
int indeces_size, int *sequence, int sequence_size, 
int longest_word, int* devWords, int  *counter)
{   
    int id = blockIdx.x * blockDim.x + threadIdx.x;
    int start = id * (CHUNK_SIZE - longest_word); 
    int finish = start + CHUNK_SIZE;
    int word_index = -1;

    if (finish > sequence_size)
    {
        finish = sequence_size;
    }
    // search in a closed area
    while(start < finish)
    {
    find_word_in_phoneme_dictionary_kernel(dictionary, dictionary_size, 
            indeces, indeces_size, sequence, &word_index, start, finish);

    if(word_index >= 0 && word_index <= indeces[indeces_size-1])
    {
        devWords[*counter]   = word_index; 
        devWords[*counter+1] = start;      // index in sequence 
        *counter+=2;
        start += dictionary[word_index];
    }
    else
    {
        start++;
    }
}
__syncthreads();
}

I also tried to create for each thread his own array and counter to store there his results and then to collect all the threads results.. but i don't understand how to implement the gather in CUDA. any help?

Foi útil?

Solução

I guess the problem is that your counter is read and incremented by multiple threads. As a result, multiple threads will use the same counter value as index in the array. You should instead use int atomicAdd(int* address, int val); to increment the counter. The code would look like this:

int oldCounter = atomicAdd(counter, 2);
devWords[oldCounter]   = word_index;
devWords[oldCounter+1] = start;

Note that I incremented counter before accessing the array. atomicAdd(...) returns the old value of the counter, which I then used to access the array. The Atomic operations however are serialized, which means that incrementing the counter can not run in parallel. The rest oft the code is still running in parallel though.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top