Pergunta

I'm trying to figure out how I can use OpenMP's for reduction() equivalent in CUDA. I've done some research online, and none of what I've tried worked. The code:

    #pragma omp parallel for reduction(+:sum)
    for (i = 0; i < N; i++)
    {
        float f = ...  //store return from function to f
        out[i] = f;    //store f to out[i]
        sum += f;      //add f to sum and store in sum
    }

I know what for reduction() does in OpenMP....it makes the last line of the for loop possible. But how can I use CUDA to express the same thing?

Thanks!

Foi útil?

Solução

Use Thrust, An STL inspired library that comes with CUDA. See the Quick Start Guide for examples on how to perform reductions.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top